AB Test Calculator

Enter your visitor counts and conversions for Variation A and Variation B, choose your hypothesis type and confidence level, and the AB Test Calculator tells you whether your result is statistically significant. You get the conversion rates for both variants, the relative uplift, p-value, and a clear pass/fail significance verdict.

Results

Significant Result

Conversion Rate A

Conversion Rate B

Relative Uplift

p-value

Z-Score

Statistical Power

Results Table

More Statistics Tools

Frequently Asked Questions

What is A/B testing and why is it important?

A/B testing (also called split testing) is a method of comparing two versions of a webpage, email, or other asset to determine which one performs better. You show version A to one group and version B to another, then measure which produces more conversions. It removes guesswork from optimization decisions and ensures changes are backed by real user data.

What does statistical significance mean in an A/B test?

Statistical significance tells you how confident you can be that the difference in conversion rates between A and B is real and not just due to random chance. A 95% confidence level means there is only a 5% probability that the observed difference occurred by chance. The lower the p-value, the stronger the evidence against the null hypothesis.

What is a p-value and how do I interpret it?

The p-value is the probability of observing a result at least as extreme as your data if there were truly no difference between variants. A p-value below 0.05 corresponds to 95% confidence. The smaller the p-value (e.g. 0.01), the stronger the evidence that variant B is genuinely different from variant A.

Should I use a one-sided or two-sided test?

Use a one-sided test if you only care whether variant B is better than A (and would never roll out B if it were worse). Use a two-sided test if you want to detect any meaningful difference in either direction — this is generally recommended as it guards against accidentally shipping a harmful change.

How many visitors do I need for a valid A/B test?

Sample size depends on your current baseline conversion rate, the minimum detectable effect you care about, your desired confidence level, and the statistical power you want (typically 80%). As a rule of thumb, most A/B tests need at least several hundred conversions per variation to achieve reliable results. Running the test with too few visitors increases the risk of a false positive.

What is statistical power in an A/B test?

Statistical power is the probability that your test will correctly detect a real difference when one exists. A power of 80% means you have an 80% chance of finding a significant result if the true uplift equals your expected uplift. Low power increases the risk of a false negative — concluding there is no difference when there actually is one.

Why am I not getting a significant result?

Common reasons include insufficient sample size, too short a test duration, a real uplift that is smaller than your minimum detectable effect, or high variability in your data. You may need to collect more data, run the test longer, or reconsider whether the change is impactful enough to measure with your current traffic levels.

What is relative uplift versus absolute difference?

Absolute difference is simply Conversion Rate B minus Conversion Rate A. Relative uplift expresses that difference as a percentage of the original rate — e.g. if A converts at 2% and B at 2.4%, the absolute difference is 0.4 percentage points but the relative uplift is 20%. Relative uplift is often more meaningful when comparing tests across different baseline rates.

Results

Conversion Rate Comparison: A vs B

Results Table

More Statistics Tools

Frequently Asked Questions