How to Measure If Your Email Campaigns Actually Work
Every email marketer tracks open rates, click rates, and conversion rates. These metrics are easy to measure and easy to report. They are also dangerously misleading.
The core problem is simple: when someone opens your email and then makes a purchase, you do not know if the email caused the purchase. Maybe they were already planning to buy. Maybe they saw a Facebook ad an hour earlier. Maybe they had a coupon from last week. Open-and-click metrics tell you that people interacted with your email. They do not tell you that your email changed their behavior.
This article walks through the real way to measure email campaign effectiveness: holdout testing. It is not new science, but it is dramatically underused in email marketing.
The Problem with Standard Email Metrics
Let's start with what the typical email dashboard shows you:
**Open rate:** What percentage of recipients opened the email. This has been increasingly unreliable since Apple's Mail Privacy Protection started pre-loading email pixels in 2021. On iOS, nearly every email shows as "opened" whether the person actually read it or not.
**Click rate:** What percentage clicked a link. Better than open rate, but still does not tell you if the click led to a purchase, or if the person would have purchased anyway.
**Conversion rate / attributed revenue:** This is where the real trouble starts. Most platforms use last-touch or multi-touch attribution to credit revenue to campaigns. If someone received your email and purchased within a 5-day window, the campaign gets credit. But correlation is not causation.
Here is a concrete example. You send a winback email to 20,000 customers who have not purchased in 60 days. Over the next week, 400 of them make a purchase totaling $32,000. Your dashboard shows $32,000 in attributed revenue. But how many of those 400 customers would have purchased anyway? Without a control group, you have no idea. The real incremental revenue could be $8,000 or $28,000. You are flying blind.
How Holdout Testing Works
Holdout testing solves this by introducing a simple comparison:
1. Take your target audience for a campaign (say, 20,000 customers).
2. Randomly hold out 10% (2,000 customers) who will not receive the campaign.
3. Send the campaign to the remaining 18,000 (the treatment group).
4. Wait for your measurement window (typically 7-14 days).
5. Compare revenue per customer in both groups.
If the treatment group averaged $1.78 per customer and the holdout averaged $1.35, your campaign drove an incremental $0.43 per customer. Multiply by 18,000 treated customers, and you have $7,740 in true incremental revenue.
Notice how different that might be from the $32,000 your attribution dashboard would show. The $32,000 includes all the revenue from people who would have bought anyway. The $7,740 is what your campaign actually caused.
Both numbers are useful. The $32,000 tells you that your campaign reached people who spent money. The $7,740 tells you how much of that money you can actually take credit for.
Step-by-Step: Running Your First Holdout Test
**Step 1: Pick one campaign to test.** Start with a campaign that is important to your business and runs regularly, like a winback flow, a post-purchase upsell, or a weekly promotional send. Do not try to test everything at once.
**Step 2: Determine your holdout size.** For most campaigns, 10% works well. If your audience is small (under 5,000), consider 15-20% to get results faster. If your audience is very large (over 100,000), you can get away with 5%.
**Step 3: Create the holdout segment.** In your ESP, create a random segment of the target audience to exclude. Make sure the randomization is truly random, not based on engagement or spend tiers. Some ESPs have built-in A/B testing features you can repurpose for this.
**Step 4: Run the campaign normally.** Send to everyone except the holdout. Do not change anything else about the campaign.
**Step 5: Wait and measure.** After 7-14 days, pull revenue data for both groups. Calculate revenue per customer for each group, and the difference is your incremental lift.
**Step 6: Check significance.** Use a two-proportion z-test or an online significance calculator to verify that your result is statistically significant (p < 0.05). If you do not have significance, you may need a larger sample size or a longer measurement window.
Scalversion handles steps 2 through 6 automatically. Every campaign is deployed with a deterministic holdout, and the measurement report includes lift estimates, confidence intervals, and significance tests.
What to Do with the Results
Once you have holdout-measured results, you can make much sharper decisions:
**Kill underperforming campaigns.** If a campaign shows zero or near-zero incremental lift, it is not driving value. You might be annoying customers with messages that do not change their behavior. Cut it or redesign it.
**Double down on winners.** If a campaign shows strong lift, consider increasing its frequency, expanding the audience, or investing more in personalization.
**Optimize the right thing.** Instead of optimizing for click rates (which do not guarantee revenue), optimize for incremental lift. A boring email with high lift is better than a beautiful email with no lift.
**Build credibility with leadership.** When you present campaign results backed by holdout measurement, you are speaking the language of finance: causal impact, statistical confidence, incremental revenue. This builds trust and budget.
Conclusion
Standard email metrics are useful for operational monitoring, but they cannot tell you whether your campaigns are actually driving revenue. Holdout testing can.
The good news is that holdout testing is not complicated. It requires a small sacrifice of audience (the holdout group) in exchange for dramatically better measurement. If you care about proving the value of your email program, this is the single most important practice you can adopt.