Quite often, I come across the question: how can I determine if my email test results are statistically significant? Recently someone asked this on EmailMarketersclub.com again. I won’t dive into the discussion, if (or when) this is necessary at all or if it wouldn’t be better to just rely on brainpower and gut feelings when evaluating tests in internet marketing. Instead, I’ll just provide you with a quick, yet well-founded and flexible solution.
Performing a Chi-squared test using R:
- First, install the R package on your computer if you haven’t done so, yet. (R is one of the most used statistical programming languages and statistical software environments today.)
- Open its user interface by e.g. double-clicking “Rgui.exe” in the “bin” directory of your Windows installation. In addition, open a text editor by e.g. pressing the keys WIN + R and entering “notepad” on Windows machines.
- Now, copy and paste the following source code into your text editor:
(test.mat <- matrix(c(
300,700,
341,659),2,
dimnames=list(c("openers","nonopeners"))))
chisq.test(x=test.mat) - After that, adjust the four frequencies according to your Split A/B test results. In my example above for instance, I sent 1000 (= 300+700) emails with subject line A to group [,1] and 1000 (=341+659) emails with a slightly different subject line B to group [,2]. Within group [,1], 300 recipients opened and 700 did not (30% open rate). In group [,2] accordingly 341 opened whereas 569 did not (34,1% open rate).
- Copy your adjusted code from the text editor, paste it into the R console and hit enter. Watch the output (blue). It should say something like this:
- In the last output lines, examine the “p-value”. It’s the error probability for assuming a significant dependency between subject line lines (or whatever parameter you test) and open rates (or whatever performance indicator you use). The smaller the better! If it fits your desired confidence level, you can speak of statistically significant results. Otherwise, your test couldn’t prove them; then we assume, differences in open rates have been due to chance. Take for example a confidence level of 95%. This would allow an error probability of 5% (100%-95%). I.e. you accept a false positive in one out of twenty cases (100 / 5). False positive means, you see a statistically significant impact from e.g. subject on e.g. email opens, although there is in fact none.
- To sum it up, the p-value in our example is greater than our desired error probability (5.529% > 5.0%). We have not achieved significant results – at a 95% confidence level.
(Note that we would, if we got one more opener in [,2] – test it!)
Adding a bit more output:
If you want some additional explanatory text in your R output or if you conduct e.g. a Split A/B/C/D test, use this code snippet as a template, which also calculates the critical X-squared value:
300,341,360,310,
700,659,640,690),
nrow=2,byrow=T,
dimnames=list(c("openers","nonopeners")))
# specify confidence level
conf <- 0.95
# do test
df <- cumprod(dim(test.mat)-1)[2]
chisq.crit <- qchisq(conf,df)
print(test.res <- chisq.test(x=test.mat))
# addditional screen output
cat("Critital X-squared is",chisq.crit,"for a",conf,"confidence level and",df,"degree(s) of freedom")
if((chisq.crit >= test.res$statistic)==TRUE) {
cat("RESULT:",chisq.crit,"=>",test.res$statistic,"=> not significant")
} else {
cat("RESULT:",chisq.crit,"<",test.res$statistic,"=> significant with an error probability of p =",round(test.res$p.value*100,2),"%") }
You can adjust your confidence level by changing the 0.95 to e.g. 0.9 or 0.8.
RT @absolit: Determining statistical significance for email split tests http://t.co/0B3bG7Bq
Pingback: Determining statistical significance for email split tests, pt. 2: sample sizes | E-Mail Marketing Tipps
Pingback: Email pre-testing: Determining required group sizes and margins of error | E-Mail Marketing Tipps
Pingback: [Split-Test] Email signups doubled AND bounces reduced at the same time – see how | E-Mail Marketing Tipps