Google AdWords Customer Match lets you target email lists for remarketing purposes. Marketers upload their subscribers, and Google Search, YouTube, or Gmail, match them against signed in users to show them targeted ads. This makes it possible to e.g. reach out to inactive email subscribers on other channels easily.
To create such a list, one has to meet certain requirements. For privacy reasons, it’s always a good idea to hash addresses as a means of pseudonymization. Below you’ll find a step-by-step guide to accomplishing that using RapidMiner, an open source predictive analytics platform, which may be an alternative to using Excel.
(I also put the process in my Dropbox, so that you can reproduce it by using “import process” in RapidMiner’s file menu.)
- First, download and install RapidMiner Studio, run it and start building a new process (CTRL-N).
- Select the “Read CSV” operator (or another operator that corresponds to your file format) in the Operators panel on the left, and drag & drop it onto your main process worksheet:
- Make sure the “Read CSV” operator is selected within your main process (left click). Then use the “Import Configuration Wizard” in the Parameters panel on the right to tell RapidMiner the format of your data file:
- Now, drag and drop the “Generate Empty Attribute” operator onto your worksheet, connect its input port with “Read CSV” operator’s output port, and configure it to add new column named “sha256” of type “polynomial”. You may need to switch to “Expert View” (press F4) in order to select the type:
- Add the “Select Attributes” operator, connect it to the “exa” output port of the “Generate Empty Attribute” operator, and configure it to return only a single attribute, namely “sha256”:
- Add the “Write CSV” operator, connect it with “Select Attributes” and with the “res” port of the process on the right, and set up its parameters. You have to deselect the options “write attribute names” and “quote nominal values”. I chose “hashes.csv” as a filename:
- Look for the “Execute Script” operator and drop it on (!) the connecting wire between “Generate Empty Attribute” and “Select Attributes”. Edit its text and paste in the following code snippet, which will fill in the SHA-256 hashes in our newly generated column named “sha256” for the normalized values in a column named “Email”. (You may need to rename the column name, which contains the addresses, into “Email” beforehand using the “Rename” operator.)
This is the relevant code snippet in plain text:
ExampleSet exampleSet = operator.getInput(ExampleSet.class);
MessageDigest m = MessageDigest.getInstance("SHA-256");
for (Example example : exampleSet) {
normEmail = example["Email"].trim().toLowerCase();
m.update(normEmail.getBytes());
byte[] digest = m.digest();
BigInteger bigInt = new BigInteger(1,digest);
String hashtext = bigInt.toString(16);
// Zero pad it to get 64 SHA-256 hash characters.
while(hashtext.length() < 64 ) {
hashtext = "0"+hashtext;
}
example['sha256'] = hashtext;
}
return exampleSet;
- Now, run the process by pressing F11. If you set it up correctly, it will lead you to the Results Perspective (F9):
- Et voilà, the hash for examPLe@GMAIL.com is 264e53d93759bde067fd01ef2698f98d1253c730d12f021116f02eebcfa9ace6, just like in the Google help. You are ready to upload the hashes in “hashes.csv” to AdWords: