1

I am working on churn prediction. The observation and performance windows are sliced as below:

# use user last n.th mounth and create userpofile from this
#   |## observed period - user profile # |##perdict period-chur or not###|
#   |<-     number_of_months           ->|<-   predict_period_months   ->|

For concrete situation the windows are:

number_of_months=18 predict_period_months=4

def last_nth_month(x):
     min_date = x['MONTH_ID'].max()-pd.DateOffset(months=(number_of_months+predict_period_months))
     max_date = x['MONTH_ID'].max()-pd.DateOffset(months=predict_period_months)        
     return x.loc[(x['MONTH_ID']< max_date) & (x['MONTH_ID']>min_date),:]

The user profile is based on 18 month of behaviour in past and 4 last months didn't use for training and testing.

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

As a results I got pretty good scores using Lightgbm:

          precision    recall  f1-score   support

       0       0.91      0.95      0.93     49092
       1       0.95      0.91      0.93     49092

accuracy                           0.93     98184

macro avg 0.93 0.93 0.93 98184 weighted avg 0.93 0.93 0.93 98184

Accuracy = 0.9309256090605394

Is there any suggestion about how to use information in the period of the last 4 months for the testing trained model?

zdz
  • 307
  • 1
  • 2
  • 9
  • How many records does each person have in your training set? 1 or 18? What is the MCC performance? – Chris Mar 07 '20 at 22:54
  • The training dataset has one row for each person created on the basis of user behavior in the last 18 months. I didnt measure MCC. Some of the features are: ['gender', 'num_of_extra_situ', 'situ_count_0', 'situ_count_1', 'situ_count_2', 'situ_count_3', 'situ_count_4', 'num_of_adviser', 'num_of_agreement', 'avg_payment', 'proc_churn'] – zdz Mar 08 '20 at 07:45

1 Answers1

0

Let's say that you have a data between 01-2019 and 01-2021.

You can train your model using the data from 01-2019 to 08-2021, where you create your features using your window of 18 months and targets from the 4 coming months (directly after the 18 months). Do some cross validation and make sure that the model generalize well.

Then you can try to mimic the same thing happening in reality by creating another test set using the last 4 months from 08-2021 to 01-2021, meaning your features will be from last 18 before 08-2021 and your target from the already churned customers between 08-2021 and 01-2021.

ZAKARYA ROUZKI
  • 450
  • 4
  • 6
  • I think the end date should be 01-2022. How can the end of training set be 08-2021 while the last record is on 01-2021? – Mehdi Jun 13 '23 at 11:06