Best Practices in Cohort Analysis for Subscription Businesses

Performing a cohort analysis for recurring revenue businesses is not easy. It takes a trained professional to do it right and explain it. I am talking about this in the context of a company that has multiple years worth of data that wishes to understand key stats about their customers' behavior over time. Mainly, the retention, lifetime value, and customer acquisition cost as well as meta calculations and derivatives of those data points (LTV:CaC, CaC payback time)

Performing a retention cohort analysis requires knowledge of curves. Is the decline exponential or is it linear? Exponential means that the greatest number of customer leave early on and then those that do stay, tend to stay for longer. That will be important when determining how you calculate LTV. You may want to do a hard cutoff that calculates the average value of a customer over 2 years and/or 5 years rather assuming there is a linear retention rate. 
  • Define your cohorts: A cohort is a group of customers who share a common characteristic, such as sign-up date or purchase date. You can define your cohorts based on different criteria, such as the month or year of sign-up, or the product they purchased.
  • Collect and clean your data: Make sure you have complete and accurate data on customer sign-ups, purchases, and cancellations. This will allow you to track the behavior of each cohort over time.
  • Analyze retention rates: Retention rate is the percentage of customers who remain active subscribers over a specific period of time. To calculate retention rate, divide the number of active subscribers in a given cohort by the number of original subscribers in that cohort.
  • Compare cohort performance: Compare the retention rates of different cohorts to identify patterns and trends. This can help you identify which segments of customers are more likely to stay active and which are more likely to cancel.
  • Identify key metrics: Identify key metrics that are important for your business, such as customer lifetime value, revenue per customer, or churn rate. Track these metrics for each cohort over time to better understand their behavior.
  • Use multiple analysis methods: Use different analysis methods to gain insights into your data, such as A/B testing, survival analysis, or cohort-based predictive modeling.
  • Use visualization tools: Use data visualization tools to help you understand your data and communicate your findings to others.
  • Act on insights: Use the insights you gain from your cohort analysis to improve your business strategy and operations. For example, you may decide to improve the onboarding process for new customers or to focus on retaining high-value customers.
Here is a little more detail on calculating annual retention rates with monthly data:

  1. Collect your data: Gather data on customer sign-ups, purchases, and cancellations for each month. Make sure the data is complete and accurate.
  2. Group customers by sign-up month: Group customers by the month in which they signed up for the subscription. This will create your cohorts.
  3. Calculate monthly retention rates: For each cohort, calculate the retention rate for each month by dividing the number of active subscribers in that month by the number of original subscribers in the cohort.
  4. Calculate annual retention rate: To calculate the annual retention rate, you can take the average of the retention rate for each month of the year. For example, if you want to calculate the annual retention rate for the cohort of customers who signed up in January, you would take the average of the retention rate for each month of the year (e.g. January through December). 
  5. Repeat this step for all 12 cohorts. The easiest way to get an aggregate 1 year average is to look at the total customers the signed up in year 1, and then look at how many customers are left 12 months after each signup cohort start month. Divide that by the initial signups. Think of this more as annual periods rather than fiscal years. You can calculate period 2 annual retention rate averages based on the same methodology as above, except you are looking at the customers remaining 24 months after they joined each monthly cohort, and so on for every 12 month period. This effectively tells you on average how many customers will be left 1/2/3/4/etc... years after signing up.
  6. Compare retention rate by cohort: Compare the annual retention rates of different cohorts to identify patterns and trends. You can use visualization tools such as line chart or heatmap to make it more clear. This can help you identify which segments of customers are more likely to stay active and which are more likely to cancel.
For creating a 5 year forecast, this is my favorite subscription-based financial model that runs on retention cohorts. You can manually define the percentage of customers that are left at any given renewal period (whether that be monthly, every 6 months, every 12 months, or what have you after signup, or use the built-in linear curve.

Calculating Annual Revenue Retention Rates

This is pretty similar to the customer retention rate calculation except you are adding up the total revenue in the first 12 months of each monthly cohort and comparing that to the total revenue from the next 12 months. You continue to compare the next 12 months worth of revenue for each monthly cohort to the initial revenue of the first 12 months to get the annual revenue retention rate.

Calculating Average Lifetime Value for Historical Data

This is actually pretty simple. You can take the total revenue generated over a defined period of time for each monthly cohort. That is the cohorts total value. Divide that value by the total number of customers to get the average LTV per customer for the cohort. You can then take a weighted average of all the LTVs in a given yearly cohort, weighting the LTVs against the starting customer totals for each monthly cohort. You then have an average LTV for each year.

I am not going to get into predictive LTVs or predictive customers too much, but essentially you can fit the future data with a curve that represents the behavior of historical customer activity and then apply that curve to the initial value of more recent cohorts in order to predict the values of what a cohort may do in the next 3-5 years based on historical data. It could be an exponential decline curve or a linear declining curve, or whatever set of percentages makes the most sense to show how much revenue / customers are left over time from the initial number of signups.

About Calculating Retention for Fiscal Years

So, this is about figuring out the total customers that joined over a fiscal year, say 2023 you added 10,000 customers. Then, you want to say by the end of 2024 you have 8,000 of those original customers left. The issue with just dividing the 8,000 by the 10,000 and saying your annual retention rate is 80% is that some of those customers have been there for 24 months (those joining in January of 2023) and some of only been there for 12 months (those joining in December of 2023). So, you are getting a percentage that doesn't necessarily show you the yearly retention. You can use this, but it is more accurate to use the methodology above by looking at 12 month periods of each monthly cohort so you are truly capturing the amount of customers that remain 12 months after signing up.

The same idea goes for revenue. If you say you had 10,000 new customers in 2023 that produced $150,000 of revenue, and then in 2024 those same customers produced $110,000, you are only getting partial years for the customers revenue production. In 2024 you get month 13-24 of January 2023 cohort, but you are also getting month 2-13 of revenue of December 2023 cohort. Instead, you want to compare revenue for month 1-12 against revenue from month 13-24 for each monthly cohort and comparing the totals of those ranges.