Let’s get back to the study from my last blog post for a minute. Not only had the outcome on subject line lengths caught my attention. (Remember: subject lines containing less than 10 characters are supposed to perform best.) Another thing I found intriguing was that the day of week, on which a mailing is delivered, would have no effect on response rates. Can this be true? Let’s have a practical look into some email data.
Curious as I am, I extracted some unique open rates, send times and days of week for a couple of mailings. Email “Sendtime” was then binned into the two segments
- [-∞* -13.5] <=> “sent before 1:30pm” and
- “[13.5 – ∞] <=> “send after 1:30 pm”.
(* “-∞” means negative infinity in mathematics. It’s used differently throughout this posting, i.e. in some figures it’s displayed as an “8”.)
“Weekday” was coded accordingly as
- (-∞ – 3.5] <=> “Monday to Wednesday”,
- [3.5 – 4-5] <=> “Thursday”,
- [4.5 – 5.5] <=> “Friday” and
- [5.5 – ∞] <=> “Weekend”.
Speaking in terms of an analysis (ANOVA) of variance, this makes it a so called 2×4 factorial multivariate ANOVA (MANOVA) design.
The following figure shows a graphical representation of the so-called contingency table, which contains the frequencies of observations for each of our eight Sendtime/Weekday-cells:
As you can see, the experimental design is highly unbalanced. For instance, we got only a few mailings that fall into the category “sent on weekend + after one o’ clock” (top-right). On the other hand, we got many mailings, which have been sent on Friday after 1pm (its left). This makes an ANOVA generally a bit difficult. Nevertheless, we leave it that way and just have a look at the results.
First, let us take a look at some box plots to get a feel for the underlying data. Below, I plotted the sampled mailing open rates (black dots) over a box plot (red) over a violin plot (green). The thick red line within each box shows the median open rate for that corresponding group. Among other things, like outlier detection, one can see at first sight that the outcomes of some cells differ heavily. However, are those differences significant, i.e. are they due to different email timing?
ANOVA for email send timing
In our analysis of variance we want to test, if send time and day of week affect email open rates. Do factor means really vary systematically due to the timing, which the marketer chose? The idea behind an analysis of variance is: Assuming, any effect that’s not part of the model influences all cells equally and send time plus day of week have no impact on open rate, then we would not expect any differences between the group means. Vice versa, different group means would reflect the influence of email timing, i.e. varying send time or the day of week.
Besides thinking of time and weekday influencing open rates separately, it may also well be assumed that the combination of both influences the response rate. Think of Monday morning when the inboxes are stuffed and no one wants to browse through email ads. On the other days, the picture may be completely different. This would mean we also got interactions between our two main effects. Let us visually check our theory by looking at the so-called interaction plots of our two main factors:
If there were no interactions between Sendtime and Weekday, all lines would run parallel to each other. However, both figures indicate that send time seems to be irrelevant for the recipients at the first half of the week. This may be a hint to having some interaction between the two factors.
So let’s build our ANOVA model of Sendtime, Weekday and interactions between both. Because we got a heavily unbalanced design and we further assume interactions, Type III sum of squares are used. The ANOVA table looks as follows:
Anova Table (Type III tests) Response: Openrate Sum Sq Df F value Pr(>F) Part Eta Sq (Intercept) 13.7737 1 11679.1020 0.00000 0.99075 Weekday 0.0146 3 4.1209 0.00824 0.10187 Sendtime 0.0051 1 4.3317 0.03975 0.03822 Weekday:Sendtime 0.0032 3 0.9098 0.43887 0.02443 Residuals 0.1285 109
The most important column in the ANOVA table is “Pr(>F)”. It tells us the probability for having no statistically significant effect (“p-value”). Or more precisely, it lists the probability for having equal means across the factor levels. Therefore, with a 95% confidence level, our main effects Weekday and Sendtime have a significant effect on email open rates. Their p-values are smaller than 0.05. However, interaction effects between the two are not significant. “Weekday:Sendtime” has a p-value of only 0.44. This does not necessarily mean there aren’t any interactions in reality, but perhaps our data was not able to show them.
Another column of interest from the ANOVA table would be the last one, titled “Partial Eta Squared”. It’s one of several measure for the effect sizes. The values show that the main effect of Weekday explains about 10% of the variance. However, Sendtime explains only about 4%.
Speaking of residual variance, an ANOVA relies on several assumptions. One of them is that the residuals have to be normally distributed. This can be evaluated by looking at a so-called Q-Q-plot and doing formal tests, like the Shapiro-Wilk-Test. A second assumption would be that all groups have similar variances. A look at the box plots or performing Levene’s Test can shed light on this.
Shapiro-Wilk normality test data: residuals W = 0.9942, p-value = 0.9112 Levene's Test for Homogeneity of Variance (center = median) Df F value Pr(>F) group 7 1.0292 0.415 109
Both, the visual examination of a Q-Q-Plot and the Shapiro-Wilk-Test suggest that we got perfectly normal-distributed residuals. (For comparisons, I plotted a second Q-Q-Plot (right figure) which contains random values from the normal distribution. Just to show, how it would look like ideally.) In addition, since Levene’s Test is far from reaching a 5% significance level (p=41.5%), we also assume equal variances.
By now, we know that the main effects, “Sendtime” and “Weekday”, affect open rates. More precisely, their group means are not equal. However, the factor “Weekday” consists of 4 levels. Wouldn’t it be interesting to not only know that the levels differ, but which of them do? One can evaluate this by doing a so-called post-hoc test. One of them is Dunnett’s modified Tukey-Kramer test, also known as the T3 Procedure. The following figure shows the confidence intervals for different means in all Weekday comparisons:
With 95% confidence, only the comparison of „Weekend vs. Friday“ (=red 4-3) shows significant differences in open rate means. (Lowering the confidence level to 90% would also include the other two comparisons with 4=Weekend.) Since Sendtime has only two factor levels and interactions between Sendtime and Weekday didn’t prove to be significant, both are not part of the post-hoc test.
Conclusion: Send time optimization
All in all, the ANOVA of our data suggest to prefer sending emails …
- especially on weekends rather than on Friday, Thursday or Monday to Wednesday,
- but also rather after 1pm than before.