<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=204513679968251&amp;ev=PageView&amp;noscript=1">

Do You Even Data

A data-driven marketing blog

Want to learn how you can translate incredible data list information into killer marketing campaigns? Want to better understand how data research and models can enhance the data you already have?
All Posts

Picking a Winner: Advanced Modeling for Concept Tests

There are many ways to predict which of several concepts will “win” in a retail environment. 

In another blog, I describe some of the methods we employ to conduct a best-practice experimental design for testing concepts, packaging options, advertising copy, or anything else where several discrete choices exist. 

But it’s often not enough to simply understand which concept is best; it is critical to understand why

Knowing why consumers are picking a winner helps manufacturers and retailers capitalize on the factors that motivate consumers—either intentionally or subconsciously—to decide to buy a product or not. 

This is yet another advantage to the monadic test design; respondents are reacting to a single stimulus, not making comparisons between several of them.  So the attribute battery they complete is viewed in context of that single stimulus. 

  1. We view the responses to the attribute questions as a whole—that is, without regard to which concept respondents evaluated—we get a sense of the overall magnitude of motivating factors that drive purchase decisions for that product.
  2. We dichotomize the dependent variable (purchase intent) into 2 categories: buyers and non-buyers.  It is our decision as to what values we assign to each category.  Let’s say we used a 5-point Likert scale capturing purchase likelihood with 5 being “absolutely would buy.”  Instead of using the common top-2 box (4’s and 5’s), we may decide that only those with a response of 5 is a buyer, and those with any other response is a non-buyer.  This is the strictest interpretation of a purchase intent likelihood scale, and provides us a cleaner delineation of factors in our predictive analysis.
  3. We choose our modeling method.  In this case, we’ll use logistic regression.  This generalized linear modeling technique will tell us explicitly how to improve the odds of consumers purchasing our product.

Which label should we use for our product?

In this post, we’ll use some real data from a consumer package goods study to predict which product features influence purchase likelihood the most.  A random sample of 2,989 adults in the United States were presented one of several package label design concepts, and then asked a series of questions about that concept.

The questions were:What do you think of this concept?

  1. Consider the package design on the left.  Using the 5-point scale below, what is the likelihood you would buy this product? [PURCHASE]
  2. How new and different do you think this product is compared to others like it on the market? [UNIQUE]
  3. How much do you like this product? [HEDONIC]
  4. How much do you agree with the statement “this is a brand I trust.” [TRUST]
  5. How much do you agree with the statement “I would save money with this product.” [SAVE]

We start with some exploratory data analysis, namely Tukey’s EDA and correlations.

Tukey’s EDA and correlations

  We see the ranges, median, and mean values for each of our variables.  For the correlations, we get both numeric and graphical output.  A picture is worth 1,000 words.

For the correlations, we get both numeric and graphical output.

> corrgram(labels, order=TRUE, lower.panel=panel.shade,
+   upper.panel=panel.pie, text.panel=panel.txt,
+   main="Correlations Between Concept Variables") # Make a pretty correlogram

Correlations between concept variables

> table(Purchase)/length(Purchase) # Calculate percentages of purchasers and non-purchasers

Purchase
   0     1
0.56 0.44

 

Nearly half (44 percent) of our respondents consider themselves very likely to buy the product they saw.  We’ll need to convert this percentage into odds:

 

> purchprob <- sum(Purchase)/length(Purchase)
> purchprob

[1] 0.44

> nopurch = 1-purchprob
> nopurch

[1] 0.56

> purchaseodds <- purchprob/nopurch
> purchaseodds # The actual odds of someone purchasing these products

[1] 0.79

 

The odds of any given consumer buying a product with one of these label concepts are about 0.8 to 1.  To put that in context, if our purchaseodds variable were 1.0, the odds of buying would be 50/50.  Logistic regression enables us to see whether we can improve these odds.

 

> GLM.3 <- glm(Purchase ~ Hedonic +Save +Trust +Unique, family=binomial(logit), data=labels) # Initial logistic regression
> summary(GLM.3)

 

Call:

glm(formula = Purchase ~ Hedonic + Save + Trust + Unique, family = binomial(logit), data = labels)

 

Deviance Residuals:

   Min      1Q  Median      3Q     Max 
-2.110  -0.748  -0.264   0.737   3.109 

 

Coefficients:

            Estimate Std. Error z value Pr(>|z|)   
(Intercept)  -8.8063     0.3528  -24.96  < 2e-16 ***
Hedonic       0.2969     0.0319    9.30  < 2e-16 ***
Save          0.4292     0.0628    6.83  8.5e-12 ***
Trust         1.2309     0.0640   19.23  < 2e-16 ***
Unique       -0.0554     0.0444   -1.25     0.21   

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

(Dispersion parameter for binomial family taken to be 1)

 

Null deviance: 4102.3  on 2988  degrees of freedom
Residual deviance: 2934.2  on 2984  degrees of freedom
AIC: 2944

 

Number of Fisher Scoring iterations: 5

 

The unique variable is not statistically significant, so we run it again without that variable in the equation.

 

> GLM.4 <- glm(Purchase ~ Hedonic + Save + Trust, family=binomial(logit), data=labels) # The logistic regression model
> summary(GLM.4)

 

Call:

glm(formula = Purchase ~ Hedonic + Save + Trust, family = binomial(logit), data = labels)

Deviance Residuals:

   Min      1Q  Median      3Q     Max 
-2.050  -0.759  -0.271   0.746   3.140 

 

Coefficients:

            Estimate Std. Error z value Pr(>|z|)   
(Intercept)  -8.8060     0.3524  -24.99  < 2e-16 ***
Hedonic       0.2784     0.0282    9.88  < 2e-16 ***
Save          0.4284     0.0627    6.83  8.6e-12 ***
Trust         1.2259     0.0638   19.21  < 2e-16 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

(Dispersion parameter for binomial family taken to be 1)

 

Null deviance: 4102.3  on 2988  degrees of freedom
Residual deviance: 2935.7  on 2985  degrees of freedom
AIC: 2944

 

Number of Fisher Scoring iterations: 5

 

> Confint(GLM.4, level=.95, type="LR")

            Estimate 2.5 % 97.5 % exp(Estimate)   2.5 % 97.5 %
(Intercept)    -8.81 -9.51  -8.13       0.00015 7.4e-05 0.0003
Hedonic         0.28  0.22   0.33       1.32107 1.3e+00 1.3968
Save            0.43  0.31   0.55       1.53485 1.4e+00 1.7370
Trust           1.23  1.10   1.35       3.40730 3.0e+00 3.8669

 

The second logistic regression run provides some very important information.  All of the independent variables of our model are statistically significant, and to a very strong extent.  So improving consumers’ opinions about any one of them will improve the odds of purchase:

  • Improving how much consumers perceive that they “like” the product will increase the purchase odds by 32 percent for every point of improvement in the hedonic scale.
  • Introducing the product with an attractive price will increase the purchase odds by 53 percent for every point of improvement in the “saves me money” variable.
  • Articulating that “this is a brand I trust,” through the labeling, messaging, and media will improve the odds of purchase by 340 percent for every point of improvement in the “trust” rating scale.

Let’s do the math.  Say we were able to improve how well consumers perceive that this is “a brand I trust” from its current mean rating of 3.87 by ½ a point, to 4.37.  This increase of 0.50 X 3.40 = 1.70.  The new odds are calculated as 0.79 X 1.70, or 1.34. 

Thus, the odds of purchase go from 0.79 to 1 to 1.34 to 1, turning the purchase odds in our favor.  In fact, this half-point increase in the “brand I trust” rating nearly doubles the odds of a consumer purchasing our product.  And turning the odds in our favor is what successful product marketing is all about.

 

Dino Fire
Dino Fire
Dino Fire

Dino serves as President, Market Research & Data Science. Dino seeks the answers to questions and predictions of consumer behavior. Previously, Dino served as Chief Science Officer at FGI Research and Analytics. He is our version of Curious George; constantly seeking a different perspective on a business opportunity — new product design, needs-based segmentation. If you can write an algorithm for it, Dino will become engaged. Dino spent almost a decade at Arbitron/Nielsen in his formative years. Dino holds a BA from Kent State and an MS from Northwestern. Dino seems to have a passion for all numeric expressions.