Multivariate testing is a bit underrated. Marketing weblogs mostly focus on A/B or A/B/n tests. Those are quickly set up. But they often provide only incremental gains. MVT are more promising with regards to the outcome. Let’s look at how they work.

### Multivariate tests > split tests

Technically speaking,** split tests** are *univariate* tests. You take *one* factor and explore the impact on response of its different levels. For example, the most frequent email testing **factor** would probably be the subject line. Three **levels** to test could be:

- 10% off – winter sale is coming!
- 5€ off – winter sale is coming!
- Free shipping- winter sale is coming!

That being said, guess what ** multivariate testing** is about?

Right: Take as *many* factors as you like and explore their different levels. Doing this for each factor separately (= ** main effects**) shows, how much they contribute to the overall response. And/or measure how they work in combination (=

**).**

*interactions*What are interactions? I already wrote about them in this post. Think of interactions like eating rice pudding with cinnamon. They both taste ok or not so well on their own (main effect). But combined … extra yummy (interaction)! This also applies to your emails. Maybe one subject line works extraordinary well together with one special headline or call to action?

Subject line, headline, key visual, call to action buttons, weekday, landing pages elements et cetera – they are all candidates to have a closer look at. A recent MarketingSherpa research gives some orientation showing the most popular email elements to test:

### Design of experiments (DoE): reducing test size

Let’s assume someone starts with multivariate email tests. Maybe it’s you, maybe it’s me. But we’ll call him Peter for now. He has already got some cool ideas for challenger candidates:

**3x subject lines**with and without some clever personalization,**3x**different**salutations**,**3x****call to action**buttons, and**3x**different**key visuals**for his target group.

Of course he could do several splits. But he better wants to land one big strike.

So … the modern internet marketer would use software for this, but Peter takes his good old Montblanc and writes down a multivariate test design:

However, while writing, unforeseen problem arise. Four factors à three levels each – that makes a total of 3*3*3*3=81 permutations (see picture to the right). Therefore, Peter would have to set up 81 emails to run his test in a ** full factorial design**. Such a design combines every factor level with every possible combination of all other factor levels:

- Subject1 combined with salutation1, cta1, and visual1;
- subject1 combined with saluation2, cta1, and visual1,
- and so on.

A full factorial design allows the most detailed analysis. And it may work for smaller experiments of, say, 2x2x2 (2x subject lines, 2x headlines, 2x call to actions). But 81 test cells are far too much for obvious reasons:

- Peter’s subscriber list is too small (each of the 81 test groups would make only a super small fraction of the list; this can’t yield significant results);
- Peter has better things to do then spending hours to set up a test.

Suddenly Peter has got a bright idea. He remembers that there’s a solution called ** fractional factorial design**. This type of designs doesn’t include

*all*possible combinations, but only the important ones.

Building for example so called *small orthogonal arrays* would be a very efficient compression method, which falls into that category. The design would still be* pairwise complete*. I.e., his favorite subjectline “{firstname}, electricity prices too high?” occurs at least once with all 9 levels of the other 3 factors. And yet the test size is reduced dramatically.

He writes it down:

Yes, Peter got that right. Those **9 combinations** will be enough to **estimate all 4 main effects with 3 levels each, and thus the response of all 81 permutations**. Think about it. He just reduced his test size from 81 emails to about 10% of it!

### 15 minutes setup, 15 evaluation, 15 cheering?

Setting up 1 control and 8 treatments – what does this cost you? Not much. Just duplicate the next mailing 8 times. Varying the subject lines for the clones will take about 2 minutes. The same goes for the greeting and your call to action label. If you already got your 3 visuals ready, let it be another 4 minutes. That’s a total of … let it perhaps be 15 minutes – versus nearly 1 ½ hours compared to the full factorial design.

That’s what Peter did.

Two days later: Now, things become interesting. The test results are in …

The variation containing…

- a subject line “{city}: New rates for saving foxes”,
- a salutation “Hello {firstname}”,
- a call to action “Save now!”,
- and a visual showing a “child”

… performed best. It delivered a response rate of 26,7% – about twice as good as the worst one.

Peter wants to know more:

- How would the other 72 combinations have performed?
- What’s the contribution of each of the four factors to the overall response rate?

### How Peter does the response modeling

He opens his favorite data mining tool in order to train a statistical model based on the 9 cases. There’s no witchcraft required:

- He first has to break the four variables “subject line”, “saluation” “cta”, and “visual” into a binary form. A binary form contains only 0 vs. 1 (= false vs. true, “has that” vs. “has that not”). The resulting 12 “0/1”-attributes are also called
*dummy variables*:

- Next, he feeds a logistic regression with the dummy variables and the response data for the 9 email test cases. The resulting statistical model will then be able to predict the response rates for any possible combination. That’s why it’s called
*predictive modeling*or – in direct marketing terms –*response modeling*, - Finally, he applies the model onto the untested 72 email cases plus – for verification purposes – again on the 9 test cases:

### Results

First, Peter has a quick look at how well his model did. Therefore, he compares what the actual response rates were to what his regression predicts.

Let’s have a look at the left figure below – the column labeled “confidence(1)” contains the predicted response rate:

It’s – of course – not a perfect fit. On average, the prediction missed by about 0.4%. But it’s still very good.

Now, let’s view the top and worst predictions for all 81 test cases (right figure). As we can see, there’s yet another alternative that would have beaten our 26.7%: A personalized call to action gets a prediction of even 28,1% response rate. On the other hand, the worst combination of them all delivers just a predicted 11.4%. That means the best combination outperforms the worst one by a factor of 2.5!

Finally, a look at the absolute effects of each factor level reveals that subject line and call to action were the most effective parameters. The difference between “Get your new rate …” and the geo-personalized subject line makes a difference 8.5 percentage points in response rate …

So … Will you go multi, like Peter did?

Multivariate tests in email marketing – a practical step-by-step guide to boost your response rates http://t.co/OOuiYcMX via @LukeAnker

Multivariate tests in email marketing – a practical step-by-step guide to boost your response rat http://t.co/k66EHMav

RT @absolit: Multivariate tests in email marketing – a practical step-by-step guide to boost your response rates http://t.co/OOuiYcMX via @LukeAnker

RT @LukeAnker: Multivariate tests in email marketing “demystified”: http://t.co/k66EHMav #mvt #taguchi

RT @LukeAnker – Multivariate tests in #emailmarketing “demystified”: http://t.co/0IcPA1Wm … #mvt #taguchi

RT @LukeAnker: Multivariate tests in email marketing “demystified”: http://t.co/k66EHMav #mvt #taguchi

Pingback: Learnings from testing email signup copy | E-Mail Marketing Tipps

What software are you using to calculate the outcome of your multivariate campaign?

I used RapidMiner, today I mostly use R.

Hi Rene, thanks for the wonderful article.

I’m trying ty replicate your example using “R”.

I’m wondering which regression algorithm you are using? In your article, you mention “logistic regression” but I thought it’s only used with binary response (0 vs 1). In this example, we have a continuous response.

Thanks for helping me out!

Phil

Hi Phil, glad you liked the post. In R, having a direct marketing response variable like clicks, I’d go with glm() and the binomial() family. What kind of response do you have?

Regards,

René