How to personalize newsletter send times (aka “send time optimization”) using AI

Google Trends data suggestes that December and May are customization months. Particularly now, towards the end of the year, there’s a massive seasonal increase in interest for the topic of “personalization”. No wonder, as the end-of-year business is crucial for many, and tailoring emails to subscriber preferences is one of the most effective ways to boost sales.

In fact, email is sort of the mother of digital personalization. Data sovereignty remains in-house and the marketing channel is highly customizable. Basically, all attributes can vary based on subscriber attributes, whether it is the subject line, the offer or timing.

But wait a moment… Is email timing really customizable? Why then do all recipients receive a newsletter at the same time, say 8 o’ clock in the morning? The truth is that although timing is considered a classic success factor, senders still align it more with a rigid standard schedule than with individual recipient preferences.

This post outlines an algorithm to determine subscriber send time preferences.

Why optimize send times at all?

Some subscribers prefer to receive newsletters early in the morning, others take the time to scan their inboxes in the evening. From the sender’s point of view, it is desirable to always deliver at the right time. Because then the email is placed on top of all others and has high visibility.

If however delivery is too late or too early, new incoming emails will gradually move the message down out of the focus area. That is because inboxes are traditionally sorted chronologically. Lower visibility means a decrease in response rates.

But how do I know the right time for every recipient? Well, you could have machines automatically experiment with different times of day. Or you could use them to look into response data from past campaigns. This post examines the latter, and we use past response data of my Friday newsletter for illustration purposes.

General conditions of the experiment

In the past six years, I sent (almost) every Friday at about 5 p.m. CET a weekly roundup to about 2000 recipients. The subscriber list is a colorful mix of people interested in email marketing: 47% come from the USA, 10% from Germany and 7% from the UK. So we deal with different time zones. Each newsletter contains 15 to 30 curated articles about email marketing, which I consider worth reading.

My key question for optimizing sending times would be: When would a recipient expect the newsletter and take the most time to click on many posts?

An STO algorithm in a nutshell

Here’s a proposal for a send time optimization recipe:

1. Gather historic response data:
First look at past click timestamps per recipient (e.g. “A clicked on link1 at 2017-12-18 17:32:01”) for the last – say – twelve months. Do not take any opens into account, because compared to clicks, opens are only an inaccurate performance indicator, but also one that is further away from valuable conversions.

2. Control for send date:
Also consider only click times that occurred after Friday 5 p.m., the time and day of dispatch of the weekly email marketing roundup. This is to control for the effect of the chosen campaign send time, which is 17 o’ clock. Why? Well, for every send, most of the response is received immediately after dispatch. But this time is not necessarily representative of the potential optimum time for a recipient. Maybe 8 a. m. would have yielded even higher click rates. A drawback of excluding Friday 5 to 23 p.m. is that the proportion of subscribers, for whom an optimal send time can be calculated, is reduced. That’s because they just haven’t clicked on another day, yet.

3. Group click sprees/sessions:
Combine individual click times into click sprees/sessions per recipient and campaign. A time difference of 20 minutes between clicks marks a new click session. Save the start time of the session and its duration, which are both intended for weighting the session; the more recent and the longer, the better in terms of optimization.

4. Cluster sessions (AI/unsupervised machine learning):
Now clump together click sprees that are close together on a time-of-day timeline, and determine a representative time of day for each cluster. Each representative time will later be a potential dispatch time for that subscriber. OPTICS is a suitable clustering algorithm. One difficulty lies in the fact that times are one-dimensional (= only x values): 0 o’ clock at night is mathematically the furthest away from 23 o’ clock. However, the difference is not 23 hours, but only 1 hour. This shortcoming can be remedied by projecting time values into a two-dimensional space (x and y values). Finally, other dimensions could be added for example for weekdays versus weekends (x, y and z values) or for each weekday individually. It is to be expected that the optimal send times at the weekend will be different than during the week. In addition, the start of the week may also be different from the rest of the week, as the mailboxes are usually full on Monday morning. Thus it may be useful to include this information as another feature for clustering.

5. Apply weighting:
All potential dispatch time clusters must now be evaluated to determine the best one for each recipient. To do so, one might want to look at the number of click sprees in that cluster. In addition, more recent click sprees should be weighted higher than old ones, because preferences may have changed over time. Finally, longer click sprees should be weighted higher than smaller ones.

The result now could be a weighted mean value for each cluster and some sort of a cluster quality score. The weighting scheme and OPTICS settings are tuning parameters that influence this result.

Results

In the figure below, the best cluster for each of the 25 sample recipients is marked red, and a blue line indicates the optimized send time. Points show past click sprees. The higher the point, the longer the click spree lasted and the more valuable it is. Likewise, the brighter the point, the more recent the click spree and the more valuable it is. Last but not least, the point shape shows the day of the week on which the spree took place:

Just as interesting is a look at the aggregated results instead of individual ones. The below figure shows the preferences of German readers from Monday to Thursday. If you had to choose one send time for this group, it suggests deploying early in the morning at 9:30 or at the beginning of lunchtime. There’s another small peak at 15:30 and a bigger one at 17:30:

Final thoughts

Personalization has been a megatrend for years. Nevertheless, current email marketing practices leave a lot to be desired when it comes to the important success factor “timing”. This post, which has been on my desk as a draft for a long time, showed that newsletter timing does not have to be that rigid.

Or does it?

Well, there are afaik not many email service providers that let you set individual send times for each recipient. This is also part of the reason why I did not do tests.

So it remains to be seen how well the recipe works. A provisional solution that came to my mind could be to look at the root of the squared deviations between the calculated send times and the most recent click spree times. However, this is by no means a substitute for a good old split test.

Anyway, what’s your experience with send time optimization (STO)?

Update 2018-09-30:

There’s a follow-up post online, now, which includes a full code example and sample data. This way, you can reproduce the results yourself easily, and toy around with different algorithms.

Enjoyed this one? Subscribe for my hand-picked list of the best email marketing tips. Get inspiring ideas from international email experts, every Friday: (archive♞)
Yes, I accept the Privacy Policy
Delivery on Fridays, 5 pm CET. You can always unsubscribe.
It's valuable, I promise. Subscribers rate it >8 out of 10 (!) on average.

12 Responses to How to personalize newsletter send times (aka “send time optimization”) using AI

  1. Great article René!

    Would you mind sharing some of the (available to the general public) providers that allow you per user delivery times?

    Which tools (marketing automation) can be used to calculate and/or set those times?

    TY!

    • Glad you liked it. Check out e.g. Emarsys, which I also referenced in the post (“Craftlab”), or AudiencePoint. The first provider where I saw a feature like that was Silverpop (back in 2009). If you want to calculate the times on your own, you can basically use any tool which let’s you crunch data effectively. I used R and Weka. Steep learning curve, but I think it’s worth it.

  2. Hi, Rene.
    I am a master student majoring data analytics in Santa Clara Leavey School of Business. It was very impressive to see your analysis. Actually, as a school project, I am thinking of finding a best algorithm for STO. But, it was hard to find any related code resources. Would you mind sharing your R analysis? After I finish my research, I will share mine as well. If you are not willing to share it, it’s fine as well.
    Thank you so much!

  3. Hi Rene,
    Sorry for late reply. Thank you so much!
    I would be more than happy to see you analysis from 3, 4 and result part from your analysis, if possible.
    Since we have less experience on this project, it would be good reference for us.
    Let’s keep in touch by email! Thanks.

    • For identifying click sprees, I used this magrittr pipe, which needs a data frame with at least userid,campaignid,datumzeit (datumzeit means datetime, so it’s the date and time when a response occured as POSIXct):

      ## Identify click sprees ("sessions") based on how far away clicks are in time
      identify_clicksprees <- . %>%
        arrange(userid, campaignid, datumzeit) %>%
        group_by(userid, campaignid) %>%
        mutate(
          difftime = datumzeit-lag(datumzeit, default = 0),
          difftime = as.numeric(`units<-`(difftime, "secs"))) %>%
        mutate(clickspree_id = cumsum(difftime > 60*20)) %>% # X sekunden x Y Minuten
        group_by(clickspree_id, add = T) %>%
        mutate(
          clickspree_nclicks = n(),
          clickspree_nmins = abs(Reduce(`-`, range(datumzeit))),
          clickspree_nmins = as.numeric(`units<-`(clickspree_nmins, "mins")),
          clickspree_ageDays = difftime(Sys.time(), min(datumzeit)),
          clickspree_ageDays = as.numeric(`units<-`(clickspree_ageDays, "days"))) %>%
        ungroup

      For the clustering, I used:

      ## Cluster the timestamps of a user
      #'
      #' @param datumzeit POSIXct, response timestamp
      #' @param eps DBSCAN Epsilon, maximum distance for merging points into clusters
      #' @param othervars properly scaled matrix of other variables to include into clustering, besides the time of day of the response (e.g. age of click, weekday, ...)
      getClusters <- function(datumzeit, eps = 0.5, othervars = NULL) {
        # datumzeit <- seq(as.POSIXct("2017-12-18 00:00:00"), as.POSIXct("2017-12-18 23:00:00"), "1 hour")
        h <- as.hms(datumzeit)
        h <- hour(h)+minute(h)/60
        ha <- 2*pi*h/24
        m <- cbind(x = sin(ha), y = cos(ha))
        # as_data_frame(m) %>% ggplot(aes(x,y)) + geom_point() + theme_minimal()
        # data.frame(x = 0:23) %>% ggplot(aes(x)) + geom_rug() + theme_minimal() + theme(panel.grid = element_blank())
        if (!is.null(othervars)) m <- cbind(m, othervars)
        res <- dbscan(m, c("-E", eps, "-M", 1))
        return(res$class_ids)
      }

      I also used the following libraries:

      Sys.setenv("WEKA_HOME"="C:\\Users\\Rene\\Weka")
      library("rJava")
      library("RWekajars")
      library("RWeka")
      WPM("load-package", "optics_dbScan")
      dbscan <- make_Weka_clusterer('weka/clusterers/DBSCAN')
      library(tidyverse)
      library(lubridate)
      library(hms)

      Let me know if you have questions. The code is a bit messy… I also left in some commented debugging things.

      • Hi Rene,
        Thanks for sharing this.

        I have a question: if we use customers’ historical email open/click data, why do not you just use stats to get the favorite day and time of click?

        For instance, user A in total received 100 email, and he open and clicked 80 of them. In those 80 clicks, 30 were clicked during Saturday 17:00pm~20:00pm, 20 were clicked during Sunday 11:00am~14:00pm,
        20 were clicked during Wednesday 21:00pm~24:00pm, the result of 10 were clicked at other time. From these we know user are more likely to open email Saturday 17:00pm~20:00pm.

        Regards
        Jie

        • Hi Jie,

          Great question, thanks. Discretizing could make things easier and more efficient. However, is it to be expected as effective? E.g., where to draw the boundaries and when do you send – 3 hours is a rather huge time span. What if the 30 clicks occured between 19:30~20:00, and you send at 17:00? And how do you account for 20 of this clicks happened a year ago around 19:30, and 10 occured during the last month, atfer being reactivated, around 19:55?

          In the end, the binning approach might work well, I did not try it – so please report if you gained some insights. 🙂

          Best,
          René

  4. Hi Rene,

    I have a couple of questions. First, is there an English version of your book Email Marketing? Second, are you clustering all users click_sprees together or treating each user as a separate clustering dataset? I supposed the latter from your description but I don’t see any distinction of user_id in the code for get_cluster().

    Thanks,
    G

    • Hi G,
      Hmm no, there’s only a German version of my book. Chad White’s “email marketing rules” might be the way to go in English? I haven’t read it, but personally, I think alot of what he wrote in his blogs.

      Concerning the groupings: the click_sprees are group_by(userid, ...). This is valid for identifying the click sprees as well as getting the clusters, which is basically done this way:

      set.seed(1);df %>%
        ## filter by weekday of response
        filter(format(datumzeit, "%a")%in%c("So", "Mo", "Di", "Mi", "Do", "Fr", "Sa")) %>%
        ## add click spree ids
        identify_clicksprees %>%
        group_by(userid, campaignid, clickspree_id) %>%
        ## take just the beginning timestamp of each clickspree
        filter(datumzeit == min(datumzeit)) %>%
        ## cluster data, therefore scale age of response data and one-hot-encode weekdays
        mutate(age = scale(clickspree_ageDays)) %>% bind_cols(as.data.frame(model.matrix(~datumzeit-1, select(., datumzeit) %>% mutate(datumzeit = format(datumzeit, "%A"))))) %>% group_by(userid) %>% mutate(cluster = getClusters(datumzeit, eps=.5, cbind(age, datumzeitMontag, datumzeitDienstag, datumzeitMittwoch, datumzeitDonnerstag, datumzeitFreitag, datumzeitSamstag, datumzeitSonntag))) %>% group_by(cluster, add=TRUE) %>% mutate(cluster_n = n()) %>% select(-matches("datumzeit.+"), -age) %>% ...

      Best,
      René

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.