Customer Acquisition Scoring Model
This analysis aims to establish how targeted marketing efforts can maximize profits by acquiring customers with a high likelihood of buying.
In this project, I built a logistic regression model to estimate score, response rate, and lift rate.
Context:
Orange Apron is a subscription-based meal delivery service that provides subscribers with three meal kits per week for 52 weeks of the year. Orange Apron is considering a customer acquisition campaign and plans to run a field experiment to determine appropriate target customers. To implement the campaign, Orange Apron has rented a list containing information on 500 households. The list contains some information about the households captured in four variables.

Orange Apron has sent an invitation to all 500 names on the list to join the service. The invitation offer includes a deep discount on three weeks of service. We observe whether or not each of the 500 consumers accepted the invitation: the value of y is 1 if the person joined the service and the value is 0 otherwise. We use a random sample of 244 persons as the estimation sample (i.e., we estimate the scoring model on this data). The second list of 256 is used to test list scoring and evaluate how successful the target selection was. The 244-person list will henceforth be referred to as the estimation list, and the 256-person list will be referred to as the holdout list.
Estimation data

Holdout data

Variables:
-
Children in HH: Binary indicator of whether or not children are present in the household (1=yes, 0= no)
-
hl1, hl2, hl3: Hotline variables which are computed by the list owner and represent different index variables that indicate positive or negative purchase intent
The hypothesis is that hl1 is positively correlated with interest while hl2 and hl3 are negatively correlated with interest in a meal delivery service. Let's get into the analysis now!
First, I ran a logistic regression to predict the buying decision as a function of the available variables

From the above table, these were the inferences I drew:
-
"Children in HH" has a positive and statistically significant effect
-
hl1 has a positive and statistically significant effect
-
hl2 has a negative and statistically significant effect
-
hl3 has a negative effect but we cannot reject the null hypotheses that hl3 has no effect
-
A 1 unit increase in hl1 increases the odds ratio by 3.4%
-
A 1 unit increase in hl2 decreases the odd ratio by 2.7%
-
A 1 unit increase in hl3 has no effect on the odds as the coefficient on hl3 is not statistically significant
For the next part of the analysis, I used my regression coefficients to compute the score for each individual in my holdout data set. From that, I derived the average response rate, lift rate, marginal effect for each variable.


I then wanted to see if my model was a good fit so I sorted the holdout data in descending order of response probability and then plotted the cumulative actual and expected sales from sending N solicitations to the N best customers (N=1 to 256)

As we can observe, the model is a good fit as the expected sales match closely with the expected sales
Knowing that the grocery and meal delivery business has low margins and a high churn rate, I wanted to see how the profits would increase from targeting only the customers that were above the cut-off response rate (which I computed with CLV and solicitation cost)

