3

I am trying to built a lift/gain chart for a model I built in sklearn. I am using this post as a reference: How to build a lift chart (a.k.a gains chart) in Python? ,but I am confused about how they did it. I thought lift was defined as the response we get with a model divided by the response we get with no model (random), but I guess I am wrong because the calculation seems to be a lot more complex.

Here's what they did in blocks:

def calc_cumulative_gains(df: pd.DataFrame, actual_col: str, predicted_col:str, probability_col:str):


   df.sort_values(by=probability_col, ascending=False, inplace=True)

  subset = df[df[predicted_col] == True]

  rows = []
  for group in np.array_split(subset, 10):
          score = sklearn.metrics.accuracy_score(group[actual_col].tolist(),
                                           group[predicted_col].tolist(),
                                           normalize=False)

         rows.append({'NumCases': len(group), 'NumCorrectPredictions': score})

     lift = pd.DataFrame(rows)

OK, so far so good, I am with them. Then I get confused:

   #Cumulative Gains Calculation
   lift['RunningCorrect'] = lift['NumCorrectPredictions'].cumsum()
   lift['PercentCorrect'] = lift.apply(
         lambda x: (100 / lift['NumCorrectPredictions'].sum()) * x['RunningCorrect'], axis=1)

Why is 'PercentCorrect' calculated so that the 'RunningCorrect' is in the numerator? What does this variable even mean?

Then we got:

lift['CumulativeCorrectBestCase'] = lift['NumCases'].cumsum()
lift['PercentCorrectBestCase'] = lift['CumulativeCorrectBestCase'].apply(
   lambda x: 100 if (100 / lift['NumCorrectPredictions'].sum()) * x > 100 
    else (100 / lift[
       'NumCorrectPredictions'].sum()) * x)

Again, I am confused about what PercentCorrectBestCase even means because it seems to be a mix of variables "as are" vs. running totals

The remainder of the code is just as confusing to me because I don't understand the variables that go into the calculation.

lift['AvgCase'] = lift['NumCorrectPredictions'].sum() / len(lift)
lift['CumulativeAvgCase'] = lift['AvgCase'].cumsum()
lift['PercentAvgCase'] = lift['CumulativeAvgCase'].apply(
   lambda x: (100 / lift['NumCorrectPredictions'].sum()) * x)

#Lift Chart
lift['NormalisedPercentAvg'] = 1
lift['NormalisedPercentWithModel'] = lift['PercentCorrect'] / 
                                     lift['PercentAvgCase']

return lift

Could someone please help?

user3490622
  • 939
  • 2
  • 11
  • 30

0 Answers0