How to personalize newsletter send times (aka STO) – example code + data

This is a brief follow-up to my previous post “How to personalize newsletter send times (aka “send time optimization”) using AI”. I received a lot of nice feedback since it has been published a few months ago. When questions came in, they were mostly of a technical nature – no wonder, because the subject is technical, too. But instead of providing more and more piecework in the blog’s comments section, I thought I’d better add a full code example including example data. So here it is. If you are interested in that kind of analytics, you should be able to run everything in the free software environment for statistical computing named R.

Have fun reproducing the results and optimizing your own email send times. And don’t forget to share results. 🙂

The GIF shows how to use the code below in R (Rgui.exe):

This is the code I used to optimize sendtimes in December:

#' ######################################
#' ### INSTALL PACKAGES, LOAD DATASET ###
#' ######################################

## First, install packages if necessary:
pks <- c("rJava", "RWeka", "RWekajars", "tidyverse", "lubridate", "hms", "ggplot2", "dplyr")
if (!all(pks %in% rownames(installed.packages())))
  install.packages(pks)

## Second, download the raw data
Url <- "https://www.emailmarketingtipps.de/misc/sto_data.rda"
download.file(url = Url, destfile = tf <- file.path(tempdir(), "sto_data.rda"), mode = "wb")
load(tf) # `df` contains anonymous open & click data
#' ########################################
#' ### SETUP ENVIRONMENT, LOAD PACKAGES ###
#' ########################################
Sys.setenv("JAVA_HOME"="C:\\Program Files\\Java\\jre1.8.0_181_") # adjust as needed, must point to your Java directory
Sys.setenv("WEKA_HOME"="C:\\Users\\Rene\\Weka") # adjust as needed, must point to your Weka directory
Sys.setlocale("LC_TIME", "german") # for German weekday names etc
library("rJava")
library("RWekajars")
library("RWeka")
# WPM("install-package", "optics_dbScan") # install, if necessary
WPM("load-package", "optics_dbScan")
dbscan <- make_Weka_clusterer('weka/clusterers/DBSCAN')
library(tidyverse)
library(lubridate)
library(hms)
library(ggplot2)
pkgconfig::set_config("hms::default_tz"="UTC")
#' #################################
#' ### FUNCTIONS, MAGRITTR PIPES ###
#' #################################

## Identify click sprees ("sessions") based on how far away clicks are in time
identify_clicksprees <- . %>%
  arrange(userid, campaignid, datumzeit) %>%
  group_by(userid, campaignid) %>%
  mutate(
    difftime = datumzeit-lag(datumzeit, default = 0),
    difftime = as.numeric(`units<-`(difftime, "secs"))) %>%
  mutate(clickspree_id = cumsum(difftime > 60*20)) %>% # X seconds x Y minutes
  group_by(clickspree_id, add = T) %>%
  mutate(
    clickspree_nclicks = n(),
    clickspree_nmins = abs(Reduce(`-`, range(datumzeit))),
    clickspree_nmins = as.numeric(`units<-`(clickspree_nmins, "mins")),
    clickspree_ageDays = difftime(Sys.time(), min(datumzeit)),
    clickspree_ageDays = as.numeric(`units<-`(clickspree_ageDays, "days"))) %>%
  ungroup

## POSIXct to radians, radians into cartesian coords
#'
#' @param datumzeit POSIXct, response timestamp
datetime_to_radians_to_cartesian <- function(datumzeit) {
  # https://stackoverflow.com/questions/25213524/cylindrical-clustering-in-r-clustering-timestamp-with-other-data
  h <- as.hms(datumzeit)
  h <- hour(h)+minute(h)/60
  ha <- 2*pi*h/24 # 24 hours = 24x60 = 1440 minutes = 2 pi radians.
  m <- cbind(x = sin(ha), y = cos(ha))
  return(m)
}

## Cluster the timestamps of a user
#'
#' @param datumzeit POSIXct, response timestamp
#' @param eps DBSCAN Epsilon, maximum distance for merging points into clusters
#' @param othervars properly scaled matrix of other variables to include into clustering, besides the time of day of the response (e.g. age of click, weekday, ...)
getClusters <- function(datumzeit, eps = 0.5, othervars = NULL) {
  m <- datetime_to_radians_to_cartesian(datumzeit)
  if (!is.null(othervars)) m <- cbind(m, othervars)
  res <- dbscan(m, c("-E", eps, "-M", 1))
  return(res$class_ids)
}



## Calculate the optimal sendtime for a clickspree based on weighting vars
#'
#' @param datumzeit POSIXct, response timestamp indicating the beginning of a clickspree
#' @param age Numeric indicating the age of the clickspree in days - is it recent (bigger weight) or old (smaller weight) data?
#' @param nclicks Integer indicating the number of clicks within the clickspree (more clicks => bigger weight)
#' @param ntime Numeric indicating the number of minutes the click spree lasted (more time => bigger weight for optimal sendtime)
#' @examples
#' getSendtime(as.POSIXct(c("2018-01-01 22:00:00", "2018-01-02 04:00:00")), rep(1,2), rep(1,2))
getSendtime <- function(datumzeit, age, nclicks, ntime = NULL) {
  ang <- function(x,y) {
    # https://stackoverflow.com/questions/23018056/convert-cartesian-angles-to-polar-compass-cardinal-angles-in-r
    z <- x + 1i * y
    res <- 90 - Arg(z) / pi * 180
    res %% 360
  }
  m <- datetime_to_radians_to_cartesian(datumzeit)
  # Weighted mean within cluster: penalize old clicks and small sprees
  xy <- apply(m, MARGIN = 2, weighted.mean, w = (1/age^1.5)*log(nclicks+1))
  h <- ang(xy[1], xy[2]) / 360 * 24 # cartesian to polar to fractional hour
  wm_time <- hms(hours = trunc(h), minutes = (h - trunc(h)) * 60) # fractional hour to hh:mm:ss
  return(as.numeric(wm_time))
}
#' ###########################
#' ### OPTIMIZE SEND TIMES ###
#' ###########################
set.seed(1);df %>%
  # Date type conversion:
  mutate(datumzeit = ymd_hms(datumzeit), campaignid = ymd(campaignid)) %>%
  # Disregard opens, just look at clicks:
  filter(action == "click") %>% select(-action) %>%
  ## filter by weekday of response
  filter(format(datumzeit, "%a")%in%c("So", "Mo", "Di", "Mi", "Do", "Fr", "Sa")) %>%
  ## add click spree ids
  identify_clicksprees %>%
  group_by(userid, campaignid, clickspree_id) %>%
  ## take just the beginning timestamp of each clickspree
  filter(datumzeit == min(datumzeit)) %>%
  ## cluster data, therefore scale age of response data and one-hot-encode weekdays
  mutate(age = scale(clickspree_ageDays)) %>% bind_cols(as.data.frame(model.matrix(~datumzeit-1, select(., datumzeit) %>% mutate(datumzeit = format(datumzeit, "%A"))))) %>% group_by(userid) %>% mutate(cluster = getClusters(datumzeit, eps=.5, cbind(age, datumzeitMontag, datumzeitDienstag, datumzeitMittwoch, datumzeitDonnerstag, datumzeitFreitag, datumzeitSamstag, datumzeitSonntag))) %>% group_by(cluster, add=TRUE) %>% mutate(cluster_n = n()) %>% select(-matches("datumzeit.+"), -age) %>%
  ## cluster values? Penalties for old sprees + small clusters + clusters with huge time range
  arrange(userid, cluster, clickspree_ageDays) %>% group_by(userid, cluster) %>% mutate(cluster_value = 1/(mean(head(clickspree_ageDays, 3))^2)*log(cluster_n+1L)*1e10 * (1/(as.numeric(max(as.hms(datumzeit))-min(as.hms(datumzeit))))^0)) %>% ungroup %>%
  ## get optimal sendtime
  group_by(userid, cluster) %>% mutate(sendtime = getSendtime(datumzeit, clickspree_ageDays, clickspree_nclicks)) %>% ungroup %>% mutate(sendtime = as.hms(sendtime)) ->
  DF
#' ####################
#' ### PLOT RESULTS ###
#' ####################
DF %>%
  ## take sample
  arrange(userid, campaignid) %>% ungroup %>% filter(userid %in% sample(unique(userid), 4*4))  %>%
  ## plot
  ggplot(aes(x=as.hms(datumzeit), clickspree_nmins, size=clickspree_nclicks, color=clickspree_ageDays)) +
  scale_size(range = c(1.5, 4)) +
  geom_vline(aes(xintercept = sendtime), data = . %>% filter(cluster_n>1) %>% group_by(userid) %>% filter(cluster_value == max(cluster_value)), color="green", size = 1.5, alpha = .5) +
  geom_rect(mapping = aes(xmin=xmin, xmax=xmax, ymin = -Inf, ymax = Inf), data = . %>% filter(cluster_n>1) %>% group_by(userid) %>% filter(cluster_value == max(cluster_value)) %>% group_by(cluster, add=T) %>% dplyr::summarise(xmin = min(as.hms(datumzeit)), xmax = max(as.hms(datumzeit)), cluster_value = cluster_value[1]), color = "red", fill = NA, inherit.aes=F) +
  geom_point(aes(shape=format(datumzeit, "%A")), alpha=.5) +
  facet_wrap(~userid, scales = "free_y") +
  theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
  scale_shape_manual(values = 0:6) +
  scale_x_time(labels = function(x) substr(x, 1, 5)) +
  ggtitle("Clickers with best cluster (red) and optimized sendtime (green)" )

DF %>%  
  # Disregard Friday to Sunday (Newsletter is sent on Fridays at ~5pm CET)
  mutate(wday = factor(format(datumzeit, "%a"), levels=c("Mo", "Di", "Mi", "Do"))) %>%
  filter(!is.na(wday)) %>%
  group_by(userid) %>%
  filter(cluster_value == max(cluster_value, na.rm=T)) %>%
  ungroup %>%
  ggplot(aes(as.hms(sendtime))) +
  geom_density() +
  scale_x_time(breaks = seq(0,60*60*24,60*60*3), minor_breaks = seq(0,60*60*24,60*60*1)) +
  theme(axis.text.x = element_text(angle = 60, vjust=0.5)) +
  ggtitle("Best send times, without day of send and weekend")

 

And here are the two charts, which the last block produces:


As you can see from the data, for some subscribers there seems to be rather clear usage patterns in terms of when they read the weekly email roundup. For others, however, one may not say that. It remains to be tested, if personalized send times apart from the usual Friday 5pm dispatch time would increase response rates. What do you think?

Enjoyed this one? Subscribe for my hand-picked list of the best email marketing tips. Get inspiring ideas from international email experts, every Friday: (archive♞)
Yes, I accept the Privacy Policy
Delivery on Fridays, 5 pm CET. You can always unsubscribe.
It's valuable, I promise. Subscribers rate it >8 out of 10 (!) on average.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.