New insights into Gmail’s spam filtering

Gmail‘s email sorting and prioritizing is based on a complex sets of rules, clustering, machine learning schemes, reputation and user feedback. To be more transparent, since yesterday the webmailer not only shows why it classified certain emails as important (“priority inbox”), but also why certain messages were sorted out into the junk folder. This might also give some valuable new insights for troubleshooting inbox placement issues for email senders which occur more often since a few weeks as reported by Return Path

Use Gmail spam notes for diagnosis

First, the bad news: Unlike e.g. SpamAssassin, which usually writes down rule hits into the email header, Gmail does not provide such a detailed spam scoring report. However, the good news is that even the general explanations for every filtered message can help marketers to narrow down inbox placement problems and perhaps debunk some myths.

When browsing through my Gmail junk folder yesterday, I found the following five filtering reasons noted:

  1. It’s similar to messages that were detected by our spam filters
  2. We’ve found that lots of messages from win@thebigprizedraw.com are spam
  3. It contains content that’s typically used in spam messages.
  4. Be careful with this message. Similar messages were used to steal people’s personal information. Unless you trust the sender, don’t click links or reply with personal information.
  5. Be careful with this message. Our systems couldn’t verify that this message was really sent by youtube.com. You might want to avoid clicking links or replying with personal information.

Three rules represent core filter mechanisms

What do they mean – especially reasons number 1 to 3, which are obviously related to non-harmful unwanted emails…? Beside the information provided in the Gmail help, one can only speculate. I spot-checked the Return Path Sender Score’s accepted rates (senderscore.org, guide from Hubspot) of some junked email senders and – beeing no deliverability expert – came up with the following scheme:

  1. This seems to be primarily network related. For example new (“cold”) IPs without any historical reputation. Or emails sent from networks that spam with affiliate links and get-rich-quick offers. If you check their Return Path Sender Score, they mostly got an accepted rate lower than 80%. Also interesting: quite often only some newsletter issues land in the junk by hitting this rule, whilst others manage to reach the inbox. This possibly demonstrates that reputation changes dynamically over the time and spam/ham probabilities are sometimes close to 50/50. This rule trumps (2.) and (3.). If you fail it, you should perhaps speak with your ESP first.
  2. The second rule seems to be sender related as indicated by the sender’s email address. I saw it mainly hitting emails from no-name senders for which I signed up years ago in connection with prize draws. They send regular newsletters containing pseudo content and mostly dedicated advertisements. Such a communication will surely see many spam complaints and direct email deletions. The sender also possibly emails numerous inactive addresses. In the end, he gets only small open rates and click rates. It this rule hits, there’s no need to contact your ESP. instead, cut out inactive addresses that haven’t clicked or opened for a while, build stronger permissions at the sign-up, exceed (or at least meet) expectations, invite to respond and whitelist you and provide easy unsubscribe options.
  3. The third rules is content related. As far as I noticed, it hardly matches anything else but the usual pharmacy or casino spam. Trigger word-vectors include “casino, bonus, dollar, euro, million, player, play” in the email header and body. So, if you still worry much about content filtering, check this rule and revise your copy if necessary.

Any adds or suggestions?

For German readers:

You might also want to check out the guide to Gmail deliverability on emailmarketing.de plus a general visualization to better understand modern reputation-based filtering.

Enjoyed this one? Subscribe for my hand-picked list of the best email marketing tips. Get inspiring ideas from international email experts, every Friday: (archive♞)
Yes, I accept the Privacy Policy
Delivery on Fridays, 5 pm CET. You can always unsubscribe.
It's valuable, I promise. Subscribers rate it >8 out of 10 (!) on average.

4 Responses to New insights into Gmail’s spam filtering

  1. New insights into Gmail’s spam filtering http://t.co/wtigbyin

  2. Hi René,

    Thanks for your post.
    It was very helpful for me with my investigation of spam mark issues of our emails.
    By the way are there any new key concept updates at gmail filters at 2013?

    • Hi Yuri,
      glad it was useful. Conerning Gmail, one might consider the new inbox as a conceptual change. Some see the “promotions” tab as a 2nd spam folder. Besides that I’d say the filter or the underlying model(s) changes continuously by itself. It’s artificial intelligence/machine learning.

  3. I think more user profiles will make filters more personal. Need to make some research how to make automate email marketing process more personal…

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.