This article is aimed at both technically savvy and business-oriented readers who would like to understand what exactly churn prediction is and why it is becoming an essential tool for digitally positioned companies (or those who want to become one) . We will first provide a definition, then explain the business value of churn prediction using an interactive example, and finally shed light on the underlying models.
What is Churn Prediction?
From a company’s perspective, customer migration is referred to as churn — and its prediction at the level of individual customers is referred to as churn prediction. It is therefore a matter of predicting customer migration in order to be able to take appropriate countermeasures at an early stage . Depending on the business model, this phenomenon is more or less easy to measure: For a network operator (telco, power grid) or a streaming provider, it is easy to determine — namely based on which customers do not renew their contracts when they expire. But how does Amazon determine which customers are churning — i.e. will no longer place orders in the future? Is it a threshold in the transaction frequency that is not met? Or a fall below the average turnover within a certain time window? The exact definition of churn therefore depends largely on the business context and should be chosen according to the objective . The good news is: once you have defined churn for your own business context, you are basically in a position to be able to calculate predictions.
Why does my business need churn prediction?
Companies are certainly interested in collecting as much sales-relevant information about their customers as possible in order to tailor products and services even better to them. Perhaps the most relevant such information is the likelihood of the customer churning . If you are able to determine which customers have churned in the past (e.g. from transaction records), you can initially combine this data with other information about the customer. This can be available internally (usage behavior, socio-demographic data, surveys, etc.) or purchased externally (financial data for B2B business, consumer behavior, other socio-demographic data, etc.). There are two key reasons why this effort is worthwhile.
Firstly, it is possible to understand the customers who are leaving as best as possible , i.e. to find out in which dimensions they differ from the rest of the customers. Cleverly visualised, these insights represent great value for strategic planning.
Secondly, these patterns can be used to predict which customers are particularly at risk of churn. This allows expensive customer loyalty measures to be targeted at individual customers and tailored to maximize their efficiency (e.g. the number of contract renewals per euro invested). We will now examine this second, potentially very lucrative area in detail using an interactive example.
An interactive example
For the sake of simplicity, let’s stick with a fictitious example of a streaming service—let’s call it NetPrime. NetPrime currently sells subscriptions to one million users for 20 euros per month (so the business-relevant time interval for churn prediction in this example is one month.)
Unfortunately, NetPrime has observed that, although the total number of customers remains constant, 20% of customers churn every month. This means a loss of revenue of EUR 200,000 per month. However, NetPrime has the option of making special offers to customers at risk of churning, which can prevent churn with a certain success rate (this can be determined empirically; for the sake of simplicity, we assume 70%). However, this customer retention measure costs EUR 10 per customer. Let’s assume that NetPrime can afford special offers for EUR 10,000, i.e. can address 1,000 customers. The question now is which customers this special offer should be made to. It would of course be highly inefficient to address it to 1,000 randomly selected customers, after all, in this case NetPrime would only reach 200 churners by chance; ideally, NetPrime would direct customer retention measures exclusively to customers at risk of churning. A churn prediction model can help here by calculating the churn probabilities of individual customers. The quality of such a model is assessed using what is known as precision, i.e. the proportion of customers addressed who would actually churn. For this example, let’s look at the formulas for costs and sales:
Costs = number of customers addressed * costs per customer
Sales = number of customers addressed * precision * success rate * sales per customer
It is therefore clear that our return, i.e. the difference between sales and costs, depends largely on how specifically we can implement the churn prevention measures and how high their success rate is. To make this connection as tangible as possible, you will find an interactive graphic below that illustrates the profit depending on the precision. You have the option of changing the various parameters and thus getting a feel for the model quality (precision) at which the implementation of such a churn prediction model becomes profitable.
If you are wondering at this point why everything here is about precision and how it can even be determined in advance, I can reassure you: We will devote the last part of this article to this question.
How do churn prediction models work?
For the moment, put yourself in the shoes of the modelers or data scientists. The first step is to bring together all available data sources . As mentioned at the beginning, this can be transaction or usage data, but also socio-demographic, financial data or information on customer satisfaction . Then it is determined which customers are churning for the relevant transaction interval (in the example above, 1 month). The result is a series of monthly tables that contain all possible data (columns) for each customer (rows) as well as a churn value (typically yes or no).
In the second step, a mathematical model is trained to learn the connection between the data and the churn value . To do this, the data set is randomly divided into training data and test data (typically around 80:20). Common classification models – which the technically savvy reader will certainly recognize – are logistic regression, the support vector machine or the decision tree , or a combination of these (so-called „ensembles“). When training the model, special care is taken to ensure that it only recognizes the most necessary patterns in the data and does not simply „learn“ the data by heart – an important prerequisite for the model to deliver good results on future data (in jargon this is called „generalizing“).
In the third and final step, the model is evaluated. At this point, the precision can also be estimated. The model was first trained on the training data and then tested on the test data. In this context, „testing“ means comparing the model predictions with the actual churn values. The so-called confusion matrix can then be evaluated for the predictions on the test data.
The precision is then calculated as follows:
Precision = True positives / True positives + False positives
Because the model has never seen the test data before, this precision serves as an estimate of how reliable the model’s predictions will be on future data . The precision calculated here can therefore be compared to the interactive graph to determine whether a given model works well enough to enable profitable churn prediction in practice.
What about false negatives?
Attentive readers will have noticed that we completely ignore the false negatives, i.e. the churners that escape our model. Of course, there are ways and means to get these errors under control. The key word here is „recall“ and is calculated as follows:
Recall = True positives / True positives + False Negatives
Recall is the proportion of actual churners that our model detects as such. In practice, we can tune our model to represent a good compromise between precision and recall.
How to use a churn prediction model in practice?
Once a model is trained, it can be applied to all customer data before the end of a transaction interval to obtain a list of customers that the model predicts will churn, i.e. cancel their subscription. These „at-risk customers“ can then be profiled so that they can be prevented from churning as best as possible using tailored customer retention measures.
Summary
What is Churn Prediction?
Predicting the likelihood that a customer will churn.
Why do I need churn prediction?
To better understand customers who are leaving and to be able to implement expensive customer retention measures in a targeted and customized manner.
What data do I need for this?
In principle, information about customer interactions is sufficient (e.g. „Call to customer service on January 1st, 2020 at 12:34 pm,“ or „Purchase of XYZ on April 3rd, 2020 at 5:10 pm“). In general, however, the more information, the better or more precisely „risk customers“ can be identified.
How can I determine in advance whether my data is sufficient?
Based on the costs and efficiency of my customer retention measures, a minimum model quality can be determined (see interactive graphic). If a model with this quality can be trained on the data, it is sufficient.