QA Sample Size: How Many Calls You Need to Review
Your QA sample size is how many calls (or chats, or emails) you pull and score in quality assurance, out of everything you could have reviewed. Get it wrong in one direction and your QA scores are noise; get it wrong in the other and you burn review hours measuring something you already knew. The good news: the right number is almost always smaller than people expect — and it has very little to do with how many calls you handle.
Why it matters
A QA score you can't trust is worse than no score — it drives decisions about coaching, agents and process on numbers that are mostly luck of the draw.
The surprising part
Accuracy comes from how many calls you review, not how many you handle. A centre taking a million calls needs barely more review than one taking ten thousand.
What this guide covers
What sample size means, the four things that actually drive it, how to set yours, the traps to avoid — and a calculator that does the maths for you.
What is QA sample size?
In contact centre quality assurance, your QA sample size is the number of interactions you review and score in a given period — say, the 80 calls you QA out of the 3,000 your team handled this month. You review a sample because scoring every call is impossible; the sample stands in for the whole. The question sample size answers is: how many do I need to score before the result is a fair read on the real thing?
In plain English
Sample size is “how many calls do I have to listen to before I can trust the number?” The answer is driven by how sure you want to be and how precise you need the figure — not by what share of your calls that adds up to.
✓ What sample size IS
- A number set by the confidence and precision you need from the result
- A random draw from all the interactions you could have reviewed
- Big enough that the score is a trustworthy read, small enough to be practical
✕ What it is NOT
- A fixed percentage of your call volume (“we QA 2% of calls”)
- Whatever calls were easiest to grab or happened to be flagged
- A case of “bigger is always better” — past a point, more reviews barely help
Why it matters in CX
Sample size is where a QA programme is quietly made or broken. The score itself looks the same whether it rests on 15 calls or 350 — but only one of them is safe to act on.
For CX & QA leaders
A score you can defend in front of the executive team — not a figure someone can wave away with “that's only a handful of calls.”
For contact centre leaders
The right amount of review effort. Too little and the score is luck; too much and you're paying senior people to measure something they already know.
For operations & finance
QA time is real money. Right-sizing the sample — rather than chasing a percentage — puts those hours where they actually change the number.
What drives sample size
Four things set the number. Notice that three of them are decisions about how good you need the answer to be — and only one is about your operation, which turns out to matter least of all.
Confidence
How often you want the score to land inside your margin. 90%, 95% or 99%. Higher confidence is more cautious and needs more calls — and, counter-intuitively, gives a wider range for the same sample, not a tighter one.
Precision (margin of error)
How far off the true score you can live with, written as ±x%. A score of 80% at ±5% really sits somewhere between 75% and 85%. Tighter precision needs more calls.
Your total call volume
The one that barely matters. Once your volume is comfortably bigger than the sample, making it bigger changes the answer almost not at all — the same reason a national poll reads a whole country from a couple of thousand people.
Expected fail rate
How lopsided your pass/fail split is. A 50/50 split carries the most uncertainty and needs the most calls, so 50% is the safe assumption when you don't know your rate.
How to set your sample size
Decide what the QA is for
A read on how the team is doing overall needs a far smaller sample than fairly scoring individual agents. Be honest about which one you're doing — ranking people takes many more calls per person than most centres can manage.
Pick your confidence
95% is the standard and the right choice for almost everyone. 99% is rarely worth the extra calls for QA.
Pick your precision
±5% is a sensible default for a team-level score. Tighter than that is a lot more review for a difference you probably can't act on.
Read the number
Drop your figures into the QA Sample Size Calculator and it tells you how many calls to review — and how good your current sample already is.
Pull the calls at random
This is the step everyone skips. The number is only honest if every call had an equal chance of being picked — not the easy ones, not one team, not one part of the day.
Do the maths in seconds
The QA Sample Size Calculator works both ways: tell it how accurate you need to be and it gives you the number of calls, or tell it how many you already review and it tells you how much you can trust the result.
What good sampling gives you
Defensible scores
A QA number you can stand behind, with the maths to back it.
Right-sized effort
Review enough to be sure — and not a single call more.
Fairer team reads
Know when a sample is solid enough to compare, and when it isn't.
Spot real change
Tell a genuine shift in quality from ordinary month-to-month wobble.
Less wasted time
Stop QA-ing a percentage that's far larger than you ever needed.
Credibility
Numbers leaders trust, because they hold up to a challenge.
Common pitfalls
QA-ing a percentage of calls
“We review 2% of calls” is the most common rule of thumb and the least defensible. A percentage of a big operation is wildly more than you need; a percentage of a small one can be far too few. Sample size isn't a share — it's a number set by confidence and precision.
Cherry-picking the sample
Reviewing the easy calls, one team, or one slot of the day skews the result no matter how many you check. A big but lopsided sample doesn't average out — it just makes a wrong read look more convincing.
Ranking agents on a handful of calls
Splitting a team-sized sample across individuals leaves only a few calls each — nowhere near enough to fairly separate one agent from another. Use small samples to coach and to read the team, not to build a league table.
Reaching for 99% confidence
It feels like “more accurate,” but higher confidence widens the range for a given sample — or demands a lot more calls to hold the range steady. For QA, 95% is almost always the right call.
How to know you got it right
Check the margin you actually achieved
Run the sample you reviewed back through the calculator. If the margin on the score is tight enough to act on, you reviewed enough; if it's wide, treat the score as a rough signal.
Re-draw at random every period
A good sample this month doesn't make next month's automatic. Pull a fresh random selection each cycle rather than always landing on the same calls or agents.
Don't split hairs inside the margin
If your score moves from 82% to 84% but your margin is ±5%, nothing has actually happened. Only call it a change when it clears the margin.
The honest test
A sample is “big enough” when two things are true: the margin on the result is tight enough to make the decision in front of you, and the calls were pulled at random. Miss either one and more calls won't save you.
Frequently asked questions
How many calls should I review per agent each month?
There's no universal number — it depends on what you're using the score for. For a read on the team, you need far fewer per agent than you'd think. For fairly scoring individuals, you need many more calls each than most centres can realistically review, which is why agent rankings built on a handful of calls are so often unfair.
Should I just QA a fixed percentage of calls?
No. A percentage rule is the most common approach and the weakest. The number of calls you need is set by how confident and precise you want to be — not by your volume. A fixed percentage almost always means a large centre reviews far more than necessary while a small one reviews too few.
Does a bigger contact centre need a bigger sample?
Barely. Once your call volume is comfortably larger than your sample, making it larger changes the required number almost not at all. Reviewing a few hundred calls reads quality about the same whether you handle ten thousand a month or a million.
What confidence level should I use?
95% is the industry standard and the right setting for almost every QA programme. 90% needs fewer calls but is less certain; 99% is usually overkill and quietly demands a lot more review for little practical gain.
What's a good margin of error?
±5% is a sensible, defensible target for a team-level QA score. Treat it as context, not a rule — tighter precision is a lot more review effort for a difference you often can't act on anyway.
Can I use these samples to rank individual agents?
Usually not. Spreading a team-sized sample across agents leaves only a few calls each — far too few to tell a genuinely better agent from a worse one. Small samples are for coaching and for reading the team, not for ranking people against each other.
What if I don't know my fail rate?
Leave it at 50%. That's the most conservative assumption — it carries the most uncertainty and so produces a sample big enough to be safe whatever your true rate turns out to be.
Is a bigger sample always better?
No. Past the point where your margin is tight enough to act on, extra reviews add very little. And a big sample that was chosen badly — only easy calls, one team, one shift — is less trustworthy than a smaller one drawn at random.
Where to next
Final thoughts
Sample size is the quiet foundation under every QA score, and most contact centres set it on instinct — a round percentage, a habit, a number that “feels about right.” That's how programmes end up either drowning in review work or acting on scores that are mostly noise.
The honest version is simpler than the folklore. Decide how sure and how precise you need to be, read off the number, and pull the calls at random. Your total volume, the thing people fixate on, barely enters into it — and a giant sample chosen badly will mislead you more confidently than a small one chosen well.
Get those two things right — enough calls, picked fairly — and your QA score stops being a talking point people argue about and starts being a number you can actually run the operation on.


















