I am trying to built a lift/gain chart for a model I built in sklearn. I am using this post as a reference: How to build a lift chart (a.k.a gains chart) in Python? ,but I am confused about how they did it. I thought lift was defined as the response we get with a model divided by the response we get with no model (random), but I guess I am wrong because the calculation seems to be a lot more complex.
Here's what they did in blocks:
def calc_cumulative_gains(df: pd.DataFrame, actual_col: str, predicted_col:str, probability_col:str):
df.sort_values(by=probability_col, ascending=False, inplace=True)
subset = df[df[predicted_col] == True]
rows = []
for group in np.array_split(subset, 10):
score = sklearn.metrics.accuracy_score(group[actual_col].tolist(),
group[predicted_col].tolist(),
normalize=False)
rows.append({'NumCases': len(group), 'NumCorrectPredictions': score})
lift = pd.DataFrame(rows)
OK, so far so good, I am with them. Then I get confused:
#Cumulative Gains Calculation
lift['RunningCorrect'] = lift['NumCorrectPredictions'].cumsum()
lift['PercentCorrect'] = lift.apply(
lambda x: (100 / lift['NumCorrectPredictions'].sum()) * x['RunningCorrect'], axis=1)
Why is 'PercentCorrect' calculated so that the 'RunningCorrect' is in the numerator? What does this variable even mean?
Then we got:
lift['CumulativeCorrectBestCase'] = lift['NumCases'].cumsum()
lift['PercentCorrectBestCase'] = lift['CumulativeCorrectBestCase'].apply(
lambda x: 100 if (100 / lift['NumCorrectPredictions'].sum()) * x > 100
else (100 / lift[
'NumCorrectPredictions'].sum()) * x)
Again, I am confused about what PercentCorrectBestCase even means because it seems to be a mix of variables "as are" vs. running totals
The remainder of the code is just as confusing to me because I don't understand the variables that go into the calculation.
lift['AvgCase'] = lift['NumCorrectPredictions'].sum() / len(lift)
lift['CumulativeAvgCase'] = lift['AvgCase'].cumsum()
lift['PercentAvgCase'] = lift['CumulativeAvgCase'].apply(
lambda x: (100 / lift['NumCorrectPredictions'].sum()) * x)
#Lift Chart
lift['NormalisedPercentAvg'] = 1
lift['NormalisedPercentWithModel'] = lift['PercentCorrect'] /
lift['PercentAvgCase']
return lift
Could someone please help?