I am trying to predict the survival score and LTV for contractual and discrete policies(in insurance) in python. I browsed number of sites but I could find many examples only for non-contractual(in retail).
I have used the below code:
from lifelines import CoxPHFitter
#After all feature selection and EDA
cph_train, cph_test = train_test_split(features, test_size=0.2)
cph = CoxPHFitter()
cph.fit(cph_train, 'TIME', event_col='EVENT')
cph.print_summary()
Where TIME - is number of days between the policy taken date and current date for ACTIVE customers and between policy taken date and surrendered date for nonACTIVE customers.
EVENT - is the indicator for whether the customer is ACTIVE or not ACTIVE.
After fitting the model I got concordance of 0.7(which I feel is OK).
From here on how do I proceed to get survival score for the ACTIVE customers and lifetime value(CLTV)?
Bascially I need to predict who are the valuable customers who will stay for long with the company.
I have added some code by going thru some posts and suggestions by Cam.
censored_subjects = features.loc[features['EVENT'] == 1] #Selecting only the ACTIVE ones
unconditioned_sf = cph.predict_survival_function(censored_subjects)
conditioned_sf = unconditioned_sf.apply(lambda c: (c / c.loc[features.loc[c.name, 'TIME']]).clip_upper(1))
predictions_75 = qth_survival_times(.75, conditioned_sf)
predictions_50 = qth_survival_times(.50, conditioned_sf)
values = predictions_75.T.join(data[['PREAMT','TIME']])
values50 = predictions_50.T.join(data[['PREAMT','TIME']])
values['RemainingValue'] = values['PREAMT'] * (values[0.75] - values['TIME'])
So what does the output denote:
0.5 PREAMT TIME
--- The number in column 0.5 does it denotes the duration for which there is 50% chance for getting closed?
0.75 PREAMT TIME
--- Similarly 0.75 denotes the duration for which there is 75% chance for getting closed?
RemainingValue
--- Is it the remaining amount to be paid?
And what is the next step post-this?