I have a large customer dataset, it has things like Customer ID, Service ID, Product, etc. So the two ways we can measure churn are at a Customer-ID level, if the entire customer leaves and at a Service-ID level where maybe they cancel 2 out of 5 services.
The data looks like this, and as we can see
- Alligators stops being a customer at the end of Jan as they dont have any rows in Feb (CustomerChurn)
- Aunties stops being a customer at the end of Jan as they dont have any rows in Feb (CustomerChurn)
- Bricks continues with Apples and Oranges in Jan and Feb (ServiceContinue)
- Bricks continues being a customer but cancels two services at the end of Jan (ServiceChurn)
I am trying to write some code that creates the 'Churn' column.. I have tried
- To manually just grab lists of CustomerIDs and ServiceIDs using Set from Oct 2019, and then comparing that to Nov 2019, to find the ones that churned. This is not too slow but doesn't seem very Pythonic.
Thank you!
data = {'CustomerName': ['Alligators','Aunties', 'Bricks', 'Bricks','Bricks', 'Bricks', 'Bricks', 'Bricks', 'Bricks', 'Bricks'],
'ServiceID': [1009, 1008, 1001, 1002, 1003, 1004, 1001, 1002, 1001, 1002],
'Product': ['Apples', 'Apples', 'Apples', 'Bananas', 'Oranges', 'Watermelon', 'Apples', 'Bananas', 'Apples', 'Bananas'],
'Month': ['Jan', 'Jan', 'Jan', 'Jan', 'Jan', 'Jan', 'Feb', 'Feb', 'Mar', 'Mar'],
'Year': [2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021],
'Churn': ['CustomerChurn', 'CustomerChurn', 'ServiceContinue', 'ServiceContinue', 'ServiceChurn', 'ServiceChurn','ServiceContinue', 'ServiceContinue', 'NA', 'NA']}
df = pd.DataFrame(data)
df