Determining statistical significance for email split tests

Quite often, I come across the question: how can I determine if my email test results are statistically significant? Recently someone asked this on EmailMarketersclub.com again. I won’t dive into the discussion, if (or when) this is necessary at all or if it wouldn’t be better to just rely on brainpower and gut feelings when evaluating tests in internet marketing. Instead, I’ll just provide you with a quick, yet well-founded and flexible solution.

Performing a Chi-squared test using R:

  1. First, install the R package on your computer if you haven’t done so, yet. (R is one of the most used statistical programming languages and statistical software environments today.)
  2. Open its user interface by e.g. double-clicking “Rgui.exe” in the “bin” directory of your Windows installation. In addition, open a text editor by e.g. pressing the keys WIN + R and entering “notepad” on Windows machines.
  3. Now, copy and paste the following source code into your text editor:
    (test.mat <- matrix(c(
      300,700,
      341,659),2,
      dimnames=list(c("openers","nonopeners"))))
    chisq.test(x=test.mat)
  4. After that, adjust the four frequencies according to your Split A/B test results. In my example above for instance, I sent 1000 (= 300+700) emails with subject line A to group [,1] and 1000 (=341+659) emails with a slightly different subject line B to group [,2]. Within group [,1], 300 recipients opened and 700 did not (30% open rate). In group [,2] accordingly 341 opened whereas 569 did not (34,1% open rate).
  5. Copy your adjusted code from the text editor, paste it into the R console and hit enter. Watch the output (blue). It should say something like this:
  6. In the last output lines, examine the “p-value”. It’s the error probability for assuming a significant dependency between subject line lines (or whatever parameter you test) and open rates (or whatever performance indicator you use). The smaller the better! If it fits your desired confidence level, you can speak of statistically significant results. Otherwise, your test couldn’t prove them; then we assume, differences in open rates have been due to chance. Take for example a confidence level of 95%. This would allow an error probability of 5% (100%-95%). I.e. you accept a false positive in one out of twenty cases (100 / 5). False positive means, you see a statistically significant impact from e.g. subject on e.g. email opens, although there is in fact none.
  7. To sum it up, the p-value in our example is greater than our desired error probability (5.529% > 5.0%). We have not achieved significant results – at a 95% confidence level.
    (Note that we would, if we got one more opener in [,2] – test it!)

Adding a bit more output:

If you want some additional explanatory text in your R output or if you conduct e.g. a Split A/B/C/D test, use this code snippet as a template, which also calculates the critical X-squared value:

test.mat <- matrix(data=c(
  300,341,360,310,
  700,659,640,690),
  nrow=2,byrow=T,
  dimnames=list(c("openers","nonopeners")))  
# specify confidence level
conf <- 0.95
# do test
df <- cumprod(dim(test.mat)-1)[2]
chisq.crit <- qchisq(conf,df)
print(test.res <- chisq.test(x=test.mat))              
# addditional screen output
cat("Critital X-squared is",chisq.crit,"for a",conf,"confidence level and",df,"degree(s) of freedom")
if((chisq.crit >= test.res$statistic)==TRUE) {
  cat("RESULT:",chisq.crit,"=>",test.res$statistic,"=> not significant")
} else {
  cat("RESULT:",chisq.crit,"<",test.res$statistic,"=> significant with an error probability of p =",round(test.res$p.value*100,2),"%") }

You can adjust your confidence level by changing the 0.95 to e.g. 0.9 or 0.8.

Enjoyed this one? Subscribe for my hand-picked list of the best email marketing tips. Get inspiring ideas from international email experts, every Friday: (archive♞)
Yes, I accept the Privacy Policy
Delivery on Fridays, 5 pm CET. You can always unsubscribe.
It's valuable, I promise. Subscribers rate it >8 out of 10 (!) on average.

4 Responses to Determining statistical significance for email split tests

  1. RT @absolit: Determining statistical significance for email split tests http://t.co/0B3bG7Bq

  2. Pingback: Determining statistical significance for email split tests, pt. 2: sample sizes | E-Mail Marketing Tipps

  3. Pingback: Email pre-testing: Determining required group sizes and margins of error | E-Mail Marketing Tipps

  4. Pingback: [Split-Test] Email signups doubled AND bounces reduced at the same time – see how | E-Mail Marketing Tipps

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.