How to match an email list against a suppression list

Sometimes it’s necessary to select a list of email addresses which are not part of another list of email addresses. One use case would be a publisher matching his subscriber list against a suppression list of an advertiser. The suppression list holds users who don’t want to hear from the advertiser anymore. So it makes perfect sense to exclude them from the upcoming email send.

How can you achieve such an address matching efficiently on your computer? One way would be to use a database like Microsoft Access. A data manipulation tool like R offers another possibility. Here is a quick step-by-step guide for the latter one:

Step-by-step guide:

  1. Install R (what’s that?), one of my essential blogging tools, on your computer and run rgui.exe.
  2. Install the data.table package (if you haven’t done yet) an load it, which gives you the magnificent fread() and %chin%:
    2014-10-17_012053
  3. Download my demo subscriber list and save it under the name “list.subscribers – file6005f2cae.csv”, download the suppression list and save it under “list.suppression – file6007f2a27dc.csv”.
  4. Load both files into memory using fread(), then inspect them using View():
    2014-10-17_015936
    (Remember to set the correct file paths.)
  5. Now, matching is fast & easy:
    2014-10-17_020254
    From the original 100000 email addresses, 49970 are still left after matching against the suppression list. R shows you the head and the tail of the data set. Note that the command used tolower() to take care of different cases, because otherwise “vnqjs@gmail.com” and “VNQJS@GMAIL.COM” would be treated as being different. (Additional normalization steps might be necessary in your case, like stripping off leading white spaces in addresses like “   vnqjs@gmail.com”.)
  6. Saving the clean result is easy, too – write.csv() and his friends write.csv2() and write.table() do the job:
    2014-10-17_021607
    Here, getwd() outputs the current working directory, where “my.csv” has been saved, and shell.exec() opens it.

What about hashes?

By the way: what if you have MD5 hashes of lower case email addresses in your suppression list instead of the email addresses themselves? I mean, from a security and data privacy perspective, this would be the preferable scenario. Email addresses are personal data, whereas hashes are – more or less – anonymous data.

No problem for R. First, load the MD5 suppression list, save it under “list.suppression.md5 – file6005fb21aba.csv”. Then follow these instructions:Match aggainst md5 suppression listThe hash function digest needs some time for calculations, but in the end, the result is the same…

The code:

Last but not least, here’s a working code example for you to copy paste into the R console:

setInternet2(use = TRUE)
filenames <- c("list.subscribers – file6005f2cae.csv" = "https://docs.google.com/spreadsheets/d/1oge_Up3t5_6anrX-gZongXwZxPtHINapDOoTLiTV5n4/export?format=csv&id=1oge_Up3t5_6anrX-gZongXwZxPtHINapDOoTLiTV5n4&gid=694984303",
               "list.suppression – file6007f2a27dc.csv" = "https://docs.google.com/spreadsheets/d/1_c5X0dLuqUCC3uwDvI52rMmvMQoI-SfRdeqBWBIAKw8/export?format=csv&id=1_c5X0dLuqUCC3uwDvI52rMmvMQoI-SfRdeqBWBIAKw8&gid=437463345")
names(filenames) <- file.path(tempdir(), names(filenames))
for (x in 1:length(filenames))
  download.file(url = filenames[x], destfile = names(filenames)[x])

if (!library(data.table, logical.return = TRUE)) {
  install.packages("data.table")
  library(data.table)
}
subscribers <- fread(names(filenames)[1])
blacklist <- fread(names(filenames)[2])
(clean <- subscribers[!tolower(email) %chin% blacklist$email])
Enjoyed this one? Subscribe for my hand-picked list of the best email marketing tips. Get inspiring ideas from international email experts, every Friday: (archive♞)
Yes, I accept the Privacy Policy
Delivery on Fridays, 5 pm CET. You can always unsubscribe.
It's valuable, I promise. Subscribers rate it >8 out of 10 (!) on average.

4 Responses to How to match an email list against a suppression list

  1. This filtering technique is a very useful and cost cutting way to the mail list customers.

  2. Pingback: Popular Emoji combinations in email subject lines

  3. Do you know of any software that can bulk compare two list in excess of 1 million emails? I’ve been playing around with R and it seems to have an issue with Excel’s built in limitation of rows. Great article BTW.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.