Calculate exactly how many visitors you need per variant to run a statistically valid A/B test. Free, no sign-up required.
Your current conversion rate on the control version (e.g. 3 for 3%)
Smallest improvement you want to detect (e.g. 0.5 for +0.5 percentage point lift)
Confidence level required before declaring a winner
Include your control (A) in this count
Enter your daily traffic to get an estimated test duration
Other free calculators to help you benchmark, test, and grow your conversion rates.
Calculate your cost per lead and see if your campaigns are generating leads efficiently.
Use Tool →Calculate your email list growth rate and project future list size.
Use Tool →Calculate the return on investment from your lead magnets.
Use Tool →Generate high-converting call-to-action copy for your buttons and forms.
Use Tool →Grade your landing page and get a score with actionable improvement tips.
Use Tool →Calculate your form conversion rate and find ways to get more completions.
Use Tool →Estimate the revenue impact of adding exit-intent popups to your site.
Use Tool →Calculate the ROI of your popup campaigns and lead capture widgets.
Use Tool →Score your leads across six key attributes and instantly identify your hottest prospects.
Use Tool →Calculate the ROI of your webinars including leads generated and expected revenue.
Use Tool →Calculate your newsletter signup rate and project annual subscriber growth.
Use Tool →How It Works
No account needed, no sign-up required. Enter your test parameters and instantly get the minimum sample size needed for valid results. Completely free.
Input your current conversion rate as a percentage. This is your control: the rate at which your existing page, popup, or form is already converting visitors. If you are testing a new popup that currently converts at 3%, enter 3.
Enter the smallest improvement you want to be able to detect (e.g. 0.5 percentage points) and choose your statistical significance level. Use 95% for most tests and 99% for high-stakes decisions like pricing pages.
See how many visitors each variant needs before your results are statistically valid. Optionally enter your daily traffic to get an estimated test duration. No sign-up required. Completely free.
The Formula
This free A/B test sample size calculator uses the standard statistical formula for proportion tests. Here is the full breakdown.
Sample Size Per Variant
n = (Z^2 x p x (1 - p)) / MDE^2
Where: Z = significance Z-score, p = baseline rate (decimal), MDE = minimum detectable effect (decimal)
Example at 95% Significance
n = (1.96^2 x 0.03 x 0.97) / 0.005^2
Result: n = (3.84 x 0.0291) / 0.000025 = approx. 4,474 visitors per variant
The formula calculates the minimum sample needed per variant based on three inputs. First, your baseline conversion rate (p) determines the natural variance in your data. Lower baseline rates need larger samples because the signal is smaller relative to the noise. Second, your minimum detectable effect (MDE) defines the smallest lift you care about. Smaller MDE means more precision, which requires more data. Third, your Z-score reflects how confident you want to be in the result: 1.645 for 90%, 1.96 for 95%, and 2.576 for 99% significance.
Total sample needed is the per-variant sample multiplied by the number of variants. A two-variant A/B test needs twice the per-variant sample. A four-variant test needs four times as much. This is why multivariate tests are only practical on very high-traffic pages where you can reach the required sample quickly.
This formula assumes 80% statistical power, meaning there is an 80% probability that your test will detect a real improvement of the specified MDE if one exists. Industry standard for most CRO programs is 80% power at 95% significance. If your traffic is limited and you need faster results, 80% power at 90% significance gives you a smaller sample requirement with the trade-off of slightly higher false positive risk.
Significance Level Guide
Your significance level determines the trade-off between false positive risk and required sample size. Choose the right level for the stakes of your test.
| Significance Level | Z-Score | False Positive Rate | Best For |
|---|---|---|---|
| 90% | 1.645 | 10% | Low-stakes changes, fast iteration cycles, early-stage testing |
| 95% | 1.96 | 5% | Standard for most marketing and CRO tests |
| 99% | 2.576 | 1% | High-impact changes: pricing, checkout flows, core product decisions |
Standard A/B testing practice. Z-scores based on two-tailed tests with 80% statistical power.
By Element Type
Different elements produce different expected effect sizes, which directly affects how large a sample you need and how long your test will take to reach validity.
| Element to Test | Sample Size Impact | Typical MDE | Notes |
|---|---|---|---|
| Popup Headline | Moderate | 0.5-1% | High-traffic popups reach sample size faster. Test one headline variable at a time. |
| CTA Button Color | Low | 0.3-0.8% | Small effect expected. Needs larger sample. Prioritize higher-impact tests first. |
| CTA Button Copy | Moderate | 0.5-1.5% | Copy changes often produce larger lifts than color. Higher priority test. |
| Lead Magnet Offer | High | 1-3% | Offer changes produce the largest effect sizes. Smaller sample needed, faster results. |
| Popup Timing (exit vs timed) | High | 1-2% | Exit-intent vs timed trigger often produces significant conversion differences. |
| Form Field Count | High | 1-3% | Removing fields consistently improves conversion. Large effect means smaller sample needed. |
| Social Proof Placement | Moderate | 0.5-1.5% | Testimonials above vs below the fold. Effect varies significantly by industry. |
| Countdown Timer | High | 1-3% | Urgency elements produce measurable lifts. High expected effect accelerates testing. |
Typical MDE ranges based on CRO industry benchmarks. Actual results vary by industry, traffic quality, and page type.
Common Mistakes
A/B testing looks straightforward but has many ways to produce misleading results. These six mistakes account for the majority of invalid test conclusions in marketing teams.
Peeking at results and stopping when a variant looks like it is winning is the most common A/B testing mistake. Early stopping dramatically inflates your false positive rate. A variant that shows a 3% lift after 100 visitors has a far higher chance of being statistical noise than a real improvement. Always run your test to the required sample size.
Early stopping causes false positives in up to 50% of testsRunning a test where you change the headline, button color, image, and copy simultaneously means you have no idea which change drove the result. A/B testing requires isolation. Change one element per test. If you want to test multiple combinations, use a multivariate test with a significantly larger sample.
Multivariate tests require 3-5x the sample of simple A/B testsMonday visitors behave differently from Saturday visitors. Running a test from Monday to Wednesday and comparing it to a control that ran Tuesday to Thursday introduces day-of-week bias. Always run tests for complete weekly cycles. A test should span at least one full week, ideally two, regardless of sample size.
Week-on-week bias invalidates up to 30% of short testsIf you set a 0.1% MDE on a page with 500 monthly visitors, your required sample size will be in the hundreds of thousands. This test will never reach validity. Set your MDE based on the improvement that would actually be meaningful for your business, not the smallest possible lift you can imagine.
Unrealistic MDE settings cause 40% of tests to be abandonedStatistical significance alone does not guarantee your test is valid. Power, typically set at 80%, measures the probability of detecting a real effect when one exists. This free calculator uses a standard formula that accounts for power. Ignoring it means you may miss real improvements even when they exist.
Low-power tests miss real improvements 20-50% of the timeIf you run an A/B test, get an inconclusive result, then run it again with the same hypothesis until it shows a win, you are not doing A/B testing. You are doing selective reporting. Each additional run compounds your false positive risk. A test that fails to reach significance should be redesigned, not repeated.
Repeated testing on same hypothesis creates 30%+ false positive riskRun Better Tests
These strategies help you design, run, and interpret A/B tests that produce results your team can trust and act on with confidence.
Your required sample size does not change based on which page you test. But reaching that sample size much faster on a page with 10,000 monthly visitors versus 1,000 dramatically accelerates your optimization cycle. Start testing on your highest-traffic pages to get actionable results in days, not months.
Test the elements most likely to produce large effect sizes first. Popup headline copy, form field count, and lead magnet offer type consistently produce the largest lifts and therefore require the smallest sample sizes. Test button colors last, not first.
Try Popup Builder widget →Even if your required sample size is reached in 3 days, extend the test to cover at least a full 7-day cycle. Day-of-week traffic patterns, behavioral differences between weekday and weekend visitors, and external events all create noise that a partial-week test cannot account for.
Adding a countdown timer to an offer popup or landing page typically produces a 1-3% conversion lift, making it one of the highest-MDE elements you can test. This means you need a relatively small sample to confirm the result, making countdown timer tests ideal for sites with moderate traffic.
Try Countdown widget →Moving testimonials above the fold versus below it, switching from text testimonials to video testimonials, and testing the number of testimonials displayed are all high-leverage tests with meaningful expected effect sizes. Social proof tests often deliver 0.5-1.5% conversion lifts on lead generation pages.
Try Testimonials widget →Write down your hypothesis in this format before launching: "We believe that changing X will improve Y for Z visitors because of reason W." Tests without a written hypothesis tend to be redesigned mid-run when early results look unfavorable. A documented hypothesis keeps your test honest.
A popup variant that performs better for organic traffic may perform worse for paid traffic. Always analyze your A/B test results segmented by traffic source, device type, and new versus returning visitors before declaring a universal winner. Aggregate results can mask conflicting behavior across segments.
A testing schedule says you will run one test per month. A testing roadmap prioritizes tests by expected impact, required sample size, and strategic importance. The roadmap approach ensures you are always running the test most likely to produce the biggest business impact with the traffic you have available.
A/B Testing Glossary
Understanding the statistical concepts behind A/B testing helps you design better tests and explain your results clearly to stakeholders.
| Term | Definition | Formula / Rule | When to Use |
|---|---|---|---|
| Statistical Significance | The probability that your test result is not due to random chance. A 95% significance level means there is only a 5% chance the observed lift is noise rather than a real improvement. | 1 - (p-value) | Deciding whether a test result is trustworthy before declaring a winner |
| Minimum Detectable Effect (MDE) | The smallest improvement in conversion rate that your test is designed to reliably detect. Smaller MDE requires larger sample sizes. MDE should be set at the minimum lift that would be meaningful for your business. | Set by you based on business context | Calculating required sample size and evaluating whether a test is feasible at your traffic level |
| Statistical Power | The probability that your test will detect a real effect when one exists. Standard power is 80%, meaning you accept a 20% chance of missing a real improvement. | 1 - Beta (typically 0.80) | Ensuring your test design will catch real improvements, not just confirm null results |
| False Positive Rate (Alpha) | The probability of declaring a winner when there is actually no real difference between variants. At 95% significance, your false positive rate is 5%. | 1 - Statistical Significance | Understanding the risk of incorrectly implementing a losing variant as a winner |
| Confidence Interval | A range of values that likely contains the true conversion rate difference between variants. A wider confidence interval means less certainty about the exact lift size. | Mean plus or minus (Z x Standard Error) | Reporting test results to stakeholders and understanding the range of possible outcomes |
From the Blog
Dig deeper into the strategies behind building a rigorous A/B testing program that systematically improves your conversion rate over time.
In this article, we are going to explain how to create effective landing pages. ...
Read article →In this article, we’ll discuss 24 of the most effective ways to increase this number and keep your customers coming back...
Read article →This article will discuss the importance of a great Black Friday and Cyber Monday landing page. ...
Read article →In this article, we are going to discuss the top seven successful micro SaaS start-ups. ...
Read article →In this article, we are going to discuss the top seven micro SaaS companies. ...
Read article →Discover what micro SaaS is and how you can use this targeted approach to your benefit. ...
Read article →