Whizz Education

User cancellation modelling - on clustering of customer behaviours


Retained customers in general create higher revenues than new customers do, and making a sale to a new customer can cost up to 5 times more depending on the business. Therefore, many companies form a Customer Relationship Management (CRM) team with a focus on customer retention strategies. A crucial step is then to identify high risk customers who are intending to discontinue their usage of the services. This assessment is better known as churn prediction.

Our project aims to perform the churn prediction task based on an investigation of customers’ behavioural clusterings, and formulate the processes into a scalable pipeline which can be easily reused, updated and extended for many applications. In particular, we apply the pipeline to analyse pupil subscribers’ data for Whizz Education (referred to as “Whizz”). Whizz provides online virtual tutorial service, Math-Whizz, which pupils can access by purchasing subscriptions. We investigate whether behavioural-based data analytics can be used to help detect potential subscription cancellations.

Train and prediction workflow, and its downstream process.

The pipeline starts from representing pupils’ behaviours by numerical data, also known as feature extraction. The structured features’ data are fed into a mixture model that decodes features’ distribution as a weighted combination of simple component distributions. Since component distribution is assumed to be generated by an unobserved state, we are interested in uncovering the states as well as their associated emitted component distributions, also known as clusters. The mixture model is deemed to be effective if the identified clusters of pupils exhibit distinguishable proportions of churn, or churn rates. Assessing churn rate of the fitted mixture model results in a trained model that establishes a map between features to probability of churn. This enables the prediction of new coming pupils’ churn probabilities by feeding their feature data into the trained model.

We show that clusters inferred from behavioural data are characterised by non-trivially different churn rates. Moreover, there is recognisable pattern observed in cluster sizes. This verifies the effectiveness of the mixture model approach in uncovering which level of churn risk the pupils are at. We further investigate how features impact churn probability and argue that the mixture model can result in small overfitting issues in prediction practices. We also infer the temporal transitional probabilities of states by studying on the dynamics of behavioural changes.

Readers who are interested in this project could refer to the lay report and are very welcome to contact me through email for more details.