Tag Archives: R

Popular Emoji combinations in email subject lines

Did you download the Email Emoji Cheat Sheet? If so, then you might also remember @dataNeel’s research on combined Emoji usage in subject lines. His map is cool on its own, but even cooler now that he also published a Gephi export of the network in the comments section, so that everyone can play with the data.

Gephi is a popular free network visualization tool. I used it for example to create this and this plot of email experts on Twitter. It’s rather intuitive and comparably fast, so give it a try. Want something more programmatic? Then you should go on to Python and/or R. Followers of this blog already know R and its superb visualization and data shaping capabilities.

Here’s one example of how you can use R to explore @dataNeel’s Emoji network. Continue reading

How to match an email list against a suppression list

Sometimes it’s necessary to select a list of email addresses which are not part of another list of email addresses. One use case would be a publisher matching his subscriber list against a suppression list of an advertiser. The suppression list holds users who don’t want to hear from the advertiser anymore. So it makes perfect sense to exclude them from the upcoming email send.

How can you achieve such an address matching efficiently on your computer? One way would be to use a database like Microsoft Access. A data manipulation tool like R offers another possibility. Here is a quick step-by-step guide for the latter one: Continue reading

Email benchmarks?! Forget about them

MailerMailer recently published their 2013 email marketing metrics report. Among other things, it lists email benchmarks like click rates, open rates, click-to-open rates and bounce rates by industry. Compare yourself: Continue reading

Can seed lists measure email deliverability? A practical guide by the numbers.

A seed list is a set of artificial email addresses, which are meant to be interspersed into campaign dispatches. The underlying seed list inboxes are then checked automatically by software after a certain time. Marketers use seed lists in order to monitor their deliverability. If all emails to the seed list accounts hit the junk folder or got lost, there is a certain probability that this was the case for the subscribers’ accounts, too. However, the reliability factor is often forgotten. I want to shed some light onto the question, under what circumstances seed list results allow good estimations of inbox placement rates or spam filtering rates for a whole campaign… Continue reading

Email pre-testing: Determining required group sizes and margins of error

When testing, it’s a good idea to have some formulas to hand. For instance in split A/B/n test scenarios, you may want to inspect the relationship between sample size, level of significance, and power. Also when renting lists, no one likes to buy a pig in a poke. Instead, the campaign has to be tested on a small segment first. Only if the test turns out to provide a good return on investment, the full run will be booked.

However, the question is, how many recipients should one book for the test? Including too many recipients would only cost in case the list proves to be unprofitable. Renting too few subscribers on the other hand bears the risk that the test results are due to chance. Here’s a hands-on solution. Continue reading

Determining statistical significance for email split tests, pt. 2: sample sizes

In one of the last posts, we addressed the chi-squared test for independence. With this test, we wanted to calculate, if e.g. two subject lines have a significantly different impact on the absolute number of email opens. I provided you with a “flexible” solution. “Flexible” means, it can now easily be extended to your needs. One extension would be to determine the required sample size for each of your test cells a priori. There’s no question that a split A/B test, which only incorporates 2 x 100 recipients delivers a different reliability than one, which includes 2 x 1500 recipients. So here’s a solution for choosing the right sample size — again using the R package.
Continue reading

Email & data analysis: Does timing affect open rates? An analysis of variance (ANOVA)

Let’s get back to the study from my last blog post for a minute. Not only had the outcome on subject line lengths caught my attention. (Remember: subject lines containing less than 10 characters are supposed to perform best.) Another thing I found intriguing was that the day of week, on which a mailing is delivered, would have no effect on response rates. Can this be true? Let’s have a practical look into some email data. Continue reading