A seed list is a set of artificial email addresses, which are meant to be interspersed into campaign dispatches. The underlying seed list inboxes are then checked automatically by software after a certain time. Marketers use seed lists in order to monitor their deliverability. If all emails to the seed list accounts hit the junk folder or got lost, there is a certain probability that this was the case for the subscribers’ accounts, too. However, the reliability factor is often forgotten. I want to shed some light onto the question, under what circumstances seed list results allow good estimations of inbox placement rates or spam filtering rates for a whole campaign…
Seed list size and structure
The rotating word cloud that you see above (sorry mobile users 😉 ) could be the extract from an email seed list. It shows common remote parts, grouped by string similarity. Text size represents frequency within the list. Hotmail.com and Yahoo.com are the biggest email providers, they deserve the biggest letters.
Now, if you list your subscriber count grouped by email domain, you’ll probably see countless rows. Some of which are more important than others. To give you a good cross-section of campaign deliverability, a seed list has to cover at least your most popular domains. That is of course, if you don’t want to focus on just one specific provider, like e.g. gmail.com.
To meet this requirement, good seed lists for global senders contain 500 or even more addresses. Structure and size are quality features. A seed list may contain 10,000 email addresses, but it’s worthless, if it’s completely inconsistent with your domain structure. Vice versa, a seed list that includes every single domain of yours, but each only once, may not provide a good estimation. This leads us to the next point.
Approaching estimation errors
Statistically speaking, seed lists are samples out of subscriber lists. The latter one would be the population. Now, an extrapolation from sampling to the whole population has errors. But they can be limited. How many gmail addresses are required in the seed to estimate an interval for e.g. the inbox placement rate of @gmail.com? This depends on the number of gmail-addresses in the subscriber base, on how confident we want to be, and on the inbox placement rate that we assume ex ante from our experience or benchmarks.
Sounds too abstract? We explore the dependencies by looking at the formula from our former email pre-testing posting. A practical example could look like this:
- I check my Return Path Sender Score at senderscore.org for the IP, which sent me my last Weekly Email Marketing Recap newsletter:
- For a (btw nearly perfect 🙂 ) Sender Score of 98, the Sender Score Benchmark Report suggest an inbox placement rate of 93% (= 0.93) within Gmail’s inboxes:
- For determining the sample size, I want to allow a margin of error of only 1% (= 0.01) so that the targeted inbox placement rate is between 92% and 94% (= 93% +/- 1%).
- Furthermore, I want to be 95% confident with my estimation, i.e. the significance level is 0.05 (= 5% = 100% – 95%).
- Let’s assume we got 1,000 @gmail subscribers.
- How many @gmail addresses do I need in my seed list, now? Easy! We just fill in the values into our formula:
- We would need a seed list that includes at least 715 @gmail addresses.
Wow, that’s a huge amount. Clearly too much for most of us. A reasonable conclusion would be to accept a higher margin of error. Let’s examine that.
Assume, our seed list contains 30 @gmail addresses and reports an inbox placement rate of 88%. What can we conclude from this at a 10% significance level – again having 1,000 gmail subscribers…? No guesswork, let’s type it in (red):
The output (bottom blue) suggests an error margin of about +/- 10%. If the seed list reports 88% inbox placement rate, we can conclude with 90% confidence that the true inbox placement rate in our campaign was between 78% and 98%.
Well, that might be good to know. But one thing you should remember is that such wide intervals make optimizations difficult. It’s because you just can’t track the results exactly over time. Except for adding more seed list addresses or lowering the confidence level (although online marketing isn’t medical research, you shouldn’t go below 80%). Keep that in mind when reading the next fancy deliverability optimization charts on the web…
User behavior & the future
Last but not least, seed lists fail to account for individual user actions. Behind a seed list account, there stands no one who opens or clicks emails of personal interest. Instead, there’s just a computer program, which scans folders.
The problem with that is: modern spam filters are said to include past user behavior in filtering decisions. Therefore, an email could make it to the inbox of recipient A, because he always clicked those emails from the sender. But at the same time, the exact same newsletter could be sorted out into the junk folder of recipient B, because the newsletters have always been irrelevant graymail to him. This is completely unrecognized by seed lists and may raise further doubts about their validity.
Interesting new phenomenons, which could accommodate this behavior shortage, are webmailer apps and desktop plug-ins. Think of rather new services like PowerInbox (see “Quo vadis, email?”) or OtherInbox. Their inbox add-ons allow already very deep insights into several 800,000 panel inboxes and more. Outcomes are new measures, like the “read rate”, or the “deleted without read rate”. Or like the true inbox placement rate for individual users. Today, these services might just be interesting for very large senders. But with the growing size of the panels, it might become an option for medium senders, too. Let’s see, what the future holds.