Statistical Significance Calculator

Check whether your A/B test or holdout experiment results are statistically significant using a two-proportion z-test.

3.20%
Treatment Rate
2.60%
Holdout Rate
0.60%
Absolute Lift
23.1%
Relative Lift
0.1576
P-Value
-0.18% to 1.38%
95% CI
Not Significant at p < 0.05

Understanding Statistical Significance in Marketing

Statistical significance is a measure of confidence that an observed difference between two groups is real and not the result of random variation. In the context of marketing experiments, it tells you whether the lift you measured from a campaign is likely a genuine effect or could have arisen by chance.

This calculator uses a two-proportion z-test, which is the standard statistical test for comparing conversion rates between two independent groups. It is the same test used by Scalversion's built-in measurement engine, academic research, and leading experimentation platforms.

The test works by computing a pooled conversion rate across both groups, then calculating the standard error to estimate how much variation you would expect under the null hypothesis (that both groups have the same conversion rate). The z-score measures how many standard errors the observed difference is from zero, and the p-value converts this to a probability.

How to Interpret Your Results

A p-value below 0.05 is the conventional threshold for statistical significance. When your result is significant, you can be reasonably confident that the campaign had a real effect on conversion rates. The confidence interval gives you the plausible range for the true size of that effect.

If your result is not significant, it does not mean the campaign had no effect. It means you do not have enough evidence to conclude there was an effect with the data you have. This often happens when sample sizes are too small or the true effect size is very small. Increasing your holdout group size or running the campaign longer can help achieve significance.

Pay attention to the confidence interval as well. A significant result with a wide confidence interval (e.g., 0.1% to 5.0% lift) gives you less precision than one with a narrow interval (e.g., 2.1% to 2.9% lift). Narrow intervals require larger sample sizes but give you more actionable insights for budget allocation.

Frequently Asked Questions

What does statistical significance mean?

Statistical significance means there is strong evidence that the observed difference between two groups is not due to random chance. A result is typically considered significant when the p-value is below 0.05, meaning there is less than a 5% probability the difference occurred by chance alone.

What is a p-value?

A p-value is the probability of observing a difference as large as (or larger than) the one in your data, assuming there is no real difference between the groups. A p-value of 0.03 means there is a 3% chance the result is due to random variation. Lower p-values indicate stronger evidence of a real effect.

How many conversions do I need for significance?

The number of conversions needed depends on the size of the effect you are measuring. Larger effects (e.g., 20% lift) require fewer conversions to detect, while smaller effects (e.g., 2% lift) require many more. As a rough guide, you typically need at least 100 conversions per group for reliable results.

What is a confidence interval?

A confidence interval gives a range within which the true difference between groups likely falls. A 95% confidence interval means that if you repeated the experiment many times, 95% of the intervals would contain the true effect size. Wider intervals indicate more uncertainty in the estimate.

Automated significance testing on every campaign

Scalversion measures statistical significance with holdout groups on every send. No spreadsheets needed.

Start Free Pilot