Free A/B Test Sample Size Calculator

Calculate exactly how many visitors you need per variant to run a statistically valid A/B test. Free, no sign-up required.

Calculate Your A/B Test Sample Size

Your current conversion rate on the control version (e.g. 3 for 3%)

Smallest improvement you want to detect (e.g. 0.5 for +0.5 percentage point lift)

Confidence level required before declaring a winner

Include your control (A) in this count

Enter your daily traffic to get an estimated test duration

Explore More Free Lead Gen Tools

Other free calculators to help you benchmark, test, and grow your conversion rates.

How It Works

How to use this free A/B test sample size calculator

No account needed, no sign-up required. Enter your test parameters and instantly get the minimum sample size needed for valid results. Completely free.

1

Enter your baseline conversion rate

Input your current conversion rate as a percentage. This is your control: the rate at which your existing page, popup, or form is already converting visitors. If you are testing a new popup that currently converts at 3%, enter 3.

2

Set your minimum detectable effect and significance level

Enter the smallest improvement you want to be able to detect (e.g. 0.5 percentage points) and choose your statistical significance level. Use 95% for most tests and 99% for high-stakes decisions like pricing pages.

3

Get your required sample size instantly

See how many visitors each variant needs before your results are statistically valid. Optionally enter your daily traffic to get an estimated test duration. No sign-up required. Completely free.

The Formula

How A/B test sample size is calculated

This free A/B test sample size calculator uses the standard statistical formula for proportion tests. Here is the full breakdown.

Sample Size Per Variant

n = (Z^2 x p x (1 - p)) / MDE^2

Where: Z = significance Z-score, p = baseline rate (decimal), MDE = minimum detectable effect (decimal)

Example at 95% Significance

n = (1.96^2 x 0.03 x 0.97) / 0.005^2

Result: n = (3.84 x 0.0291) / 0.000025 = approx. 4,474 visitors per variant

The formula calculates the minimum sample needed per variant based on three inputs. First, your baseline conversion rate (p) determines the natural variance in your data. Lower baseline rates need larger samples because the signal is smaller relative to the noise. Second, your minimum detectable effect (MDE) defines the smallest lift you care about. Smaller MDE means more precision, which requires more data. Third, your Z-score reflects how confident you want to be in the result: 1.645 for 90%, 1.96 for 95%, and 2.576 for 99% significance.

Total sample needed is the per-variant sample multiplied by the number of variants. A two-variant A/B test needs twice the per-variant sample. A four-variant test needs four times as much. This is why multivariate tests are only practical on very high-traffic pages where you can reach the required sample quickly.

This formula assumes 80% statistical power, meaning there is an 80% probability that your test will detect a real improvement of the specified MDE if one exists. Industry standard for most CRO programs is 80% power at 95% significance. If your traffic is limited and you need faster results, 80% power at 90% significance gives you a smaller sample requirement with the trade-off of slightly higher false positive risk.

Significance Level Guide

Which statistical significance level to use in 2026

Your significance level determines the trade-off between false positive risk and required sample size. Choose the right level for the stakes of your test.

Significance LevelZ-ScoreFalse Positive RateBest For
90%1.64510%Low-stakes changes, fast iteration cycles, early-stage testing
95%1.965%Standard for most marketing and CRO tests
99%2.5761%High-impact changes: pricing, checkout flows, core product decisions

Standard A/B testing practice. Z-scores based on two-tailed tests with 80% statistical power.

By Element Type

Sample size requirements by element type in 2026

Different elements produce different expected effect sizes, which directly affects how large a sample you need and how long your test will take to reach validity.

Element to TestSample Size ImpactTypical MDENotes
Popup HeadlineModerate0.5-1%High-traffic popups reach sample size faster. Test one headline variable at a time.
CTA Button ColorLow0.3-0.8%Small effect expected. Needs larger sample. Prioritize higher-impact tests first.
CTA Button CopyModerate0.5-1.5%Copy changes often produce larger lifts than color. Higher priority test.
Lead Magnet OfferHigh1-3%Offer changes produce the largest effect sizes. Smaller sample needed, faster results.
Popup Timing (exit vs timed)High1-2%Exit-intent vs timed trigger often produces significant conversion differences.
Form Field CountHigh1-3%Removing fields consistently improves conversion. Large effect means smaller sample needed.
Social Proof PlacementModerate0.5-1.5%Testimonials above vs below the fold. Effect varies significantly by industry.
Countdown TimerHigh1-3%Urgency elements produce measurable lifts. High expected effect accelerates testing.

Typical MDE ranges based on CRO industry benchmarks. Actual results vary by industry, traffic quality, and page type.

Common Mistakes

Six A/B testing mistakes that invalidate your results

A/B testing looks straightforward but has many ways to produce misleading results. These six mistakes account for the majority of invalid test conclusions in marketing teams.

⏱️

Stopping the test too early

Peeking at results and stopping when a variant looks like it is winning is the most common A/B testing mistake. Early stopping dramatically inflates your false positive rate. A variant that shows a 3% lift after 100 visitors has a far higher chance of being statistical noise than a real improvement. Always run your test to the required sample size.

Early stopping causes false positives in up to 50% of tests
🔀

Testing too many variables at once

Running a test where you change the headline, button color, image, and copy simultaneously means you have no idea which change drove the result. A/B testing requires isolation. Change one element per test. If you want to test multiple combinations, use a multivariate test with a significantly larger sample.

Multivariate tests require 3-5x the sample of simple A/B tests
🗓️

Not accounting for weekly traffic cycles

Monday visitors behave differently from Saturday visitors. Running a test from Monday to Wednesday and comparing it to a control that ran Tuesday to Thursday introduces day-of-week bias. Always run tests for complete weekly cycles. A test should span at least one full week, ideally two, regardless of sample size.

Week-on-week bias invalidates up to 30% of short tests
🎯

Setting the MDE too small for your traffic

If you set a 0.1% MDE on a page with 500 monthly visitors, your required sample size will be in the hundreds of thousands. This test will never reach validity. Set your MDE based on the improvement that would actually be meaningful for your business, not the smallest possible lift you can imagine.

Unrealistic MDE settings cause 40% of tests to be abandoned
📊

Ignoring statistical power

Statistical significance alone does not guarantee your test is valid. Power, typically set at 80%, measures the probability of detecting a real effect when one exists. This free calculator uses a standard formula that accounts for power. Ignoring it means you may miss real improvements even when they exist.

Low-power tests miss real improvements 20-50% of the time
🔄

Running the same test repeatedly until it wins

If you run an A/B test, get an inconclusive result, then run it again with the same hypothesis until it shows a win, you are not doing A/B testing. You are doing selective reporting. Each additional run compounds your false positive risk. A test that fails to reach significance should be redesigned, not repeated.

Repeated testing on same hypothesis creates 30%+ false positive risk

Run Better Tests

8 tips for running effective A/B tests

These strategies help you design, run, and interpret A/B tests that produce results your team can trust and act on with confidence.

01

Test your highest-traffic pages first

Your required sample size does not change based on which page you test. But reaching that sample size much faster on a page with 10,000 monthly visitors versus 1,000 dramatically accelerates your optimization cycle. Start testing on your highest-traffic pages to get actionable results in days, not months.

02

Start with high-impact element tests

Test the elements most likely to produce large effect sizes first. Popup headline copy, form field count, and lead magnet offer type consistently produce the largest lifts and therefore require the smallest sample sizes. Test button colors last, not first.

Try Popup Builder widget
03

Run each test for at least one full week

Even if your required sample size is reached in 3 days, extend the test to cover at least a full 7-day cycle. Day-of-week traffic patterns, behavioral differences between weekday and weekend visitors, and external events all create noise that a partial-week test cannot account for.

04

Test countdown timers for urgency impact

Adding a countdown timer to an offer popup or landing page typically produces a 1-3% conversion lift, making it one of the highest-MDE elements you can test. This means you need a relatively small sample to confirm the result, making countdown timer tests ideal for sites with moderate traffic.

Try Countdown widget
05

Test testimonial placement and format

Moving testimonials above the fold versus below it, switching from text testimonials to video testimonials, and testing the number of testimonials displayed are all high-leverage tests with meaningful expected effect sizes. Social proof tests often deliver 0.5-1.5% conversion lifts on lead generation pages.

Try Testimonials widget
06

Document your test hypothesis before starting

Write down your hypothesis in this format before launching: "We believe that changing X will improve Y for Z visitors because of reason W." Tests without a written hypothesis tend to be redesigned mid-run when early results look unfavorable. A documented hypothesis keeps your test honest.

07

Segment your results by traffic source

A popup variant that performs better for organic traffic may perform worse for paid traffic. Always analyze your A/B test results segmented by traffic source, device type, and new versus returning visitors before declaring a universal winner. Aggregate results can mask conflicting behavior across segments.

08

Build a testing roadmap, not a testing schedule

A testing schedule says you will run one test per month. A testing roadmap prioritizes tests by expected impact, required sample size, and strategic importance. The roadmap approach ensures you are always running the test most likely to produce the biggest business impact with the traffic you have available.

A/B Testing Glossary

A/B testing statistics terms compared

Understanding the statistical concepts behind A/B testing helps you design better tests and explain your results clearly to stakeholders.

TermDefinitionFormula / RuleWhen to Use
Statistical SignificanceThe probability that your test result is not due to random chance. A 95% significance level means there is only a 5% chance the observed lift is noise rather than a real improvement.1 - (p-value)Deciding whether a test result is trustworthy before declaring a winner
Minimum Detectable Effect (MDE)The smallest improvement in conversion rate that your test is designed to reliably detect. Smaller MDE requires larger sample sizes. MDE should be set at the minimum lift that would be meaningful for your business.Set by you based on business contextCalculating required sample size and evaluating whether a test is feasible at your traffic level
Statistical PowerThe probability that your test will detect a real effect when one exists. Standard power is 80%, meaning you accept a 20% chance of missing a real improvement.1 - Beta (typically 0.80)Ensuring your test design will catch real improvements, not just confirm null results
False Positive Rate (Alpha)The probability of declaring a winner when there is actually no real difference between variants. At 95% significance, your false positive rate is 5%.1 - Statistical SignificanceUnderstanding the risk of incorrectly implementing a losing variant as a winner
Confidence IntervalA range of values that likely contains the true conversion rate difference between variants. A wider confidence interval means less certainty about the exact lift size.Mean plus or minus (Z x Standard Error)Reporting test results to stakeholders and understanding the range of possible outcomes

FAQ

A/B test sample size is the minimum number of visitors each variant needs to receive before you can trust your test results as statistically valid. Running a test with too few visitors leads to inconclusive data and false positives. This free calculator tells you the exact sample size needed based on your baseline conversion rate, minimum detectable effect, and statistical significance level.
The standard formula uses your baseline conversion rate, the minimum effect you want to detect, and your desired statistical significance. For a 95% significance level the Z-score is 1.96. The formula is: n = (Z^2 x p x (1-p)) / MDE^2, where p is the baseline conversion rate and MDE is the minimum detectable effect as a decimal. The result is the required sample per variant.
Minimum detectable effect (MDE) is the smallest improvement you care about measuring. For example, if your popup currently converts at 3% and you want to detect improvements as small as 0.5 percentage points, your MDE is 0.5%. A smaller MDE requires a larger sample size to detect with confidence.
For most marketing A/B tests, 95% is the standard. This means you accept a 5% chance that your result is a false positive. Use 90% for faster, lower-stakes tests where speed matters. Use 99% for high-impact changes like pricing pages or checkout flows where a false positive would be very costly.
Run your test until each variant reaches the required sample size. If you enter your daily traffic, this free calculator estimates the test duration. Never stop a test early because one variant looks like it is winning. Early stopping dramatically increases your chance of a false positive result.
High-impact tests include popup timing and copy, CTA button color and text, headline variations on landing pages, form field count, and testimonial placement. Start with elements that have the most traffic exposure and the clearest hypothesis about what will improve performance.
No. It is completely free with no account or sign-up required.

Trusted by