How to personalize newsletter send times (aka STO) – example code + data

This is a brief follow-up to my previous post “How to personalize newsletter send times (aka “send time optimization”) using AI”. I received a lot of nice feedback since it has been published a few months ago. When questions came in, they were mostly of a technical nature – no wonder, because the subject is technical, too. But instead of providing more and more piecework in the blog’s comments section, I thought I’d better add a full code example including example data. So here it is. If you are interested in that kind of analytics, you should be able to run everything in the free software environment for statistical computing named R.

Have fun reproducing the results and optimizing your own email send times. And don’t forget to share results. 🙂

The GIF shows how to use the code below in R (Rgui.exe):

This is the code I used to optimize sendtimes in December:

#' ######################################
#' ### INSTALL PACKAGES, LOAD DATASET ###
#' ######################################

## First, install packages if necessary:
pks <- c("rJava", "RWeka", "RWekajars", "tidyverse", "lubridate", "hms", "ggplot2", "dplyr")
if (!all(pks %in% rownames(installed.packages())))
install.packages(pks)

## Second, download the raw data
Url <- "https://www.emailmarketingtipps.de/misc/sto_data.rda"
download.file(url = Url, destfile = tf <- file.path(tempdir(), "sto_data.rda"), mode = "wb")
load(tf) # `df` contains anonymous open & click data

#' ########################################
#' ### SETUP ENVIRONMENT, LOAD PACKAGES ###
#' ########################################
Sys.setenv("JAVA_HOME"="C:\\Program Files\\Java\\jre1.8.0_181_") # adjust as needed, must point to your Java directory
Sys.setenv("WEKA_HOME"="C:\\Users\\Rene\\Weka") # adjust as needed, must point to your Weka directory
Sys.setlocale("LC_TIME", "german") # for German weekday names etc
library("rJava")
library("RWekajars")
library("RWeka")
# WPM("install-package", "optics_dbScan") # install, if necessary
WPM("load-package", "optics_dbScan")
dbscan <- make_Weka_clusterer('weka/clusterers/DBSCAN')
library(tidyverse)
library(lubridate)
library(hms)
library(ggplot2)
pkgconfig::set_config("hms::default_tz"="UTC")

#' #################################
#' ### FUNCTIONS, MAGRITTR PIPES ###
#' #################################

## Identify click sprees ("sessions") based on how far away clicks are in time
identify_clicksprees <- . %>%
arrange(userid, campaignid, datumzeit) %>%
group_by(userid, campaignid) %>%
mutate(
difftime = datumzeit-lag(datumzeit, default = 0),
difftime = as.numeric(`units<-`(difftime, "secs"))) %>%
mutate(clickspree_id = cumsum(difftime > 60*20)) %>% # X seconds x Y minutes
group_by(clickspree_id, add = T) %>%
mutate(
clickspree_nclicks = n(),
clickspree_nmins = abs(Reduce(`-`, range(datumzeit))),
clickspree_nmins = as.numeric(`units<-`(clickspree_nmins, "mins")),
clickspree_ageDays = difftime(Sys.time(), min(datumzeit)),
clickspree_ageDays = as.numeric(`units<-`(clickspree_ageDays, "days"))) %>%
ungroup

## POSIXct to radians, radians into cartesian coords
#'
#' @param datumzeit POSIXct, response timestamp
datetime_to_radians_to_cartesian <- function(datumzeit) {
# https://stackoverflow.com/questions/25213524/cylindrical-clustering-in-r-clustering-timestamp-with-other-data
h <- as.hms(datumzeit)
h <- hour(h)+minute(h)/60
ha <- 2*pi*h/24 # 24 hours = 24x60 = 1440 minutes = 2 pi radians.
m <- cbind(x = sin(ha), y = cos(ha))
return(m)
}

## Cluster the timestamps of a user
#'
#' @param datumzeit POSIXct, response timestamp
#' @param eps DBSCAN Epsilon, maximum distance for merging points into clusters
#' @param othervars properly scaled matrix of other variables to include into clustering, besides the time of day of the response (e.g. age of click, weekday, ...)
getClusters <- function(datumzeit, eps = 0.5, othervars = NULL) {
m <- datetime_to_radians_to_cartesian(datumzeit)
if (!is.null(othervars)) m <- cbind(m, othervars)
res <- dbscan(m, c("-E", eps, "-M", 1))
return(res$class_ids)
}

## Calculate the optimal sendtime for a clickspree based on weighting vars
#'
#' @param datumzeit POSIXct, response timestamp indicating the beginning of a clickspree
#' @param age Numeric indicating the age of the clickspree in days - is it recent (bigger weight) or old (smaller weight) data?
#' @param nclicks Integer indicating the number of clicks within the clickspree (more clicks => bigger weight)
#' @param ntime Numeric indicating the number of minutes the click spree lasted (more time => bigger weight for optimal sendtime)
#' @examples
#' getSendtime(as.POSIXct(c("2018-01-01 22:00:00", "2018-01-02 04:00:00")), rep(1,2), rep(1,2))
getSendtime <- function(datumzeit, age, nclicks, ntime = NULL) {
ang <- function(x,y) {
# https://stackoverflow.com/questions/23018056/convert-cartesian-angles-to-polar-compass-cardinal-angles-in-r
z <- x + 1i * y
res <- 90 - Arg(z) / pi * 180
res %% 360
}
m <- datetime_to_radians_to_cartesian(datumzeit)
# Weighted mean within cluster: penalize old clicks and small sprees
xy <- apply(m, MARGIN = 2, weighted.mean, w = (1/age^1.5)*log(nclicks+1))
h <- ang(xy[1], xy[2]) / 360 * 24 # cartesian to polar to fractional hour
wm_time <- hms(hours = trunc(h), minutes = (h - trunc(h)) * 60) # fractional hour to hh:mm:ss
return(as.numeric(wm_time))
}

#' ###########################
#' ### OPTIMIZE SEND TIMES ###
#' ###########################
set.seed(1);df %>%
# Date type conversion:
mutate(datumzeit = ymd_hms(datumzeit), campaignid = ymd(campaignid)) %>%
# Disregard opens, just look at clicks:
filter(action == "click") %>% select(-action) %>%
## filter by weekday of response
filter(format(datumzeit, "%a")%in%c("So", "Mo", "Di", "Mi", "Do", "Fr", "Sa")) %>%
## add click spree ids
identify_clicksprees %>%
group_by(userid, campaignid, clickspree_id) %>%
## take just the beginning timestamp of each clickspree
filter(datumzeit == min(datumzeit)) %>%
## cluster data, therefore scale age of response data and one-hot-encode weekdays
mutate(age = scale(clickspree_ageDays)) %>% bind_cols(as.data.frame(model.matrix(~datumzeit-1, select(., datumzeit) %>% mutate(datumzeit = format(datumzeit, "%A"))))) %>% group_by(userid) %>% mutate(cluster = getClusters(datumzeit, eps=.5, cbind(age, datumzeitMontag, datumzeitDienstag, datumzeitMittwoch, datumzeitDonnerstag, datumzeitFreitag, datumzeitSamstag, datumzeitSonntag))) %>% group_by(cluster, add=TRUE) %>% mutate(cluster_n = n()) %>% select(-matches("datumzeit.+"), -age) %>%
## cluster values? Penalties for old sprees + small clusters + clusters with huge time range
arrange(userid, cluster, clickspree_ageDays) %>% group_by(userid, cluster) %>% mutate(cluster_value = 1/(mean(head(clickspree_ageDays, 3))^2)*log(cluster_n+1L)*1e10 * (1/(as.numeric(max(as.hms(datumzeit))-min(as.hms(datumzeit))))^0)) %>% ungroup %>%
## get optimal sendtime
group_by(userid, cluster) %>% mutate(sendtime = getSendtime(datumzeit, clickspree_ageDays, clickspree_nclicks)) %>% ungroup %>% mutate(sendtime = as.hms(sendtime)) ->
DF

#' ####################
#' ### PLOT RESULTS ###
#' ####################
DF %>%
## take sample
arrange(userid, campaignid) %>% ungroup %>% filter(userid %in% sample(unique(userid), 4*4)) %>%
## plot
ggplot(aes(x=as.hms(datumzeit), clickspree_nmins, size=clickspree_nclicks, color=clickspree_ageDays)) +
scale_size(range = c(1.5, 4)) +
geom_vline(aes(xintercept = sendtime), data = . %>% filter(cluster_n>1) %>% group_by(userid) %>% filter(cluster_value == max(cluster_value)), color="green", size = 1.5, alpha = .5) +
geom_rect(mapping = aes(xmin=xmin, xmax=xmax, ymin = -Inf, ymax = Inf), data = . %>% filter(cluster_n>1) %>% group_by(userid) %>% filter(cluster_value == max(cluster_value)) %>% group_by(cluster, add=T) %>% dplyr::summarise(xmin = min(as.hms(datumzeit)), xmax = max(as.hms(datumzeit)), cluster_value = cluster_value[1]), color = "red", fill = NA, inherit.aes=F) +
geom_point(aes(shape=format(datumzeit, "%A")), alpha=.5) +
facet_wrap(~userid, scales = "free_y") +
theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
scale_shape_manual(values = 0:6) +
scale_x_time(labels = function(x) substr(x, 1, 5)) +
ggtitle("Clickers with best cluster (red) and optimized sendtime (green)" )

DF %>%
# Disregard Friday to Sunday (Newsletter is sent on Fridays at ~5pm CET)
mutate(wday = factor(format(datumzeit, "%a"), levels=c("Mo", "Di", "Mi", "Do"))) %>%
filter(!is.na(wday)) %>%
group_by(userid) %>%
filter(cluster_value == max(cluster_value, na.rm=T)) %>%
ungroup %>%
ggplot(aes(as.hms(sendtime))) +
geom_density() +
scale_x_time(breaks = seq(0,60*60*24,60*60*3), minor_breaks = seq(0,60*60*24,60*60*1)) +
theme(axis.text.x = element_text(angle = 60, vjust=0.5)) +
ggtitle("Best send times, without day of send and weekend")

And here are the two charts, which the last block produces:

As you can see from the data, for some subscribers there seems to be rather clear usage patterns in terms of when they read the weekly email roundup. For others, however, one may not say that. It remains to be tested, if personalized send times apart from the usual Friday 5pm dispatch time would increase response rates. What do you think?

2 Responses to How to personalize newsletter send times (aka STO) – example code + data

Leandro Eleutério | December 4, 2018 at 9:43 pm | Reply

Hi Rene!
Thanks for the code, is really helpful!
I’m facing I few problems with the code, I thought maybe you can help me.
After the line # get optimal sendtime, where the code is:

+ group_by(userid, cluster) %>% mutate(sendtime = getSendtime(datumzeit, clickspree_ageDays, clickspree_nclicks)) %>% ungroup %>% mutate(sendtime = as.hms(sendtime)) ->
+ DF
This erro shows up:
Adding missing grouping variables: `userid`, `campaignid`, `clickspree_id`
Error in mutate_impl(.data, dots) :
Evaluation error: java.lang.ClassNotFoundException.

After the line #plot :
+ ggtitle(“Clickers with best cluster (red) and optimized sendtime (green)” )
This error shows up:
Error in arrange(userid, campaignid) : object ‘userid’ not found

And after the line # Disregard Friday to Sunday (Newsletter is sent on Fridays at ~5pm CET):
+ ggtitle(“Best send times, without day of send and weekend”)

This error shows up:
Error in eval(lhs, parent, parent) : object ‘DF’ not found
Do you have any idea of what might be happening? I’m implementing the code directly at your data set.
I’m also trying to implement this algorithm in a dataset just like yours but im getting another other error after the ## get optimal sendtime:
Adding missing grouping variables: `id`, `email_id`, `clickspree_id`
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
In addition: Warning messages:
1: 298750 failed to parse.
2: In min.default(numeric(0), na.rm = FALSE) :
no non-missing arguments to min; returning Inf
3: In max.default(numeric(0), na.rm = FALSE) :
no non-missing arguments to max; returning -Inf
and also the same two other errors.

Thanks in advance!

Leandro Eleuterio
- Rene | April 27, 2019 at 9:55 am | Reply
  
  Glad you liked it. However, hard to tell why it’s not working with your setup without being able to reproduce the error: “Evaluation error: java.lang.ClassNotFoundException”.

How to personalize newsletter send times (aka STO) – example code + data

2 Responses to How to personalize newsletter send times (aka STO) – example code + data

Leave a Reply Cancel reply

Weekly World of Email

My book @amazon.de

Recommended blog readings

Recent Comments

RSS Feeds & Social Networks

Blogroll

Meet me also on ...


^* I accept the Privacy Policy
♞Get your weekly dose of email marketing must reads, improve your emails easily (archive).