Conditional Cumulative Sums in Pandas

Question

I am a former Excel power user repenting for his sins. I need help recreating a common calculation for me.

I am trying to calculate the performance of a loan portfolio. In the numerator, I am calculating the cumulative total of losses. In the denominator, I need the original balance of the loans included in the cumulative total.

I cannot figure out how to do a conditional groupby in Pandas to accomplish this. It is very simple in Excel, so I am hoping that I am overthinking it.

I could not find much on the issue on StackOverflow, but this was the closest: python pandas conditional cumulative sum

The thing I cannot figure out is that my conditions are based on values in the index and contained in columns

Below is my data:

| Loan    | Origination | Balance | NCO Date  | NCO | As of Date | Age     (Months) | NCO Age (Months) |
|---------|-------------|---------|-----------|-----|------------|--------------|------------------|
| Loan 1  | 1/31/2011   | 1000    | 1/31/2018 | 25  | 5/31/2019  | 100              | 84               |
| Loan 2  | 3/31/2011   | 2500    |           | 0   | 5/31/2019  | 98           |                  |
| Loan 3  | 5/31/2011   | 3000    | 1/31/2019 | 15  | 5/31/2019  | 96           | 92               |
| Loan 4  | 7/31/2011   | 2500    |           | 0   | 5/31/2019  | 94           |                  |
| Loan 5  | 9/30/2011   | 1500    | 3/31/2019 | 35  | 5/31/2019  | 92           | 90               |
| Loan 6  | 11/30/2011  | 2500    |           | 0   | 5/31/2019  | 90           |                  |
| Loan 7  | 1/31/2012   | 1000    | 5/31/2019 | 5   | 5/31/2019  | 88           | 88               |
| Loan 8  | 3/31/2012   | 2500    |           | 0   | 5/31/2019  | 86           |                  |
| Loan 9  | 5/31/2012   | 1000    |           | 0   | 5/31/2019  | 84           |                  |
| Loan 10 | 7/31/2012   | 1250    |           | 0   | 5/31/2019  | 82           |                  |

In Excel, I would calculate this total using the following formulas:

Outstanding Balance Line: =SUMIFS(Balance,Age (Months),Reference Age)

Cumulative NCO: =SUMIFS(NCO,Age (Months),>=Reference Age,NCO Age (Months),<=&Reference Age)

Data:

| Reference Age       | 85    | 90    | 95   | 100  
|---------------------|-------|-------|------|------
| Outstanding Balance | 16500 | 13000 | 6500 | 1000 
| Cumulative NCO      | 25    | 60    | 40   | 25

The goal here is to include things in Outstanding Balance that are old enough to have an observation for NCO. And NCOs are the total amount that have occurred up until that point for those loans outstanding.

EDIT:

I have gotten a calculation this way. But is this the most efficient?

age_bins = list(np.arange(85, 101, 5))
final_df = pd.DataFrame()
df.fillna(value=0, inplace=True)
df["NCO Age (Months)"] = df["NCO Age (Months)"].astype(int)

for x in age_bins:

    age = x

    nco = df.loc[(df["Age (Months)"] >= x) & (df["NCO Age (Months)"] <= x), "NCO"].sum()

    bal = df.loc[(df["Age (Months)"] >= x), "Balance"].sum()

    temp_df = pd.DataFrame(
        data=[[age, nco, bal]],
        columns=["Age", "Cumulative NCO", "Outstanding Balance"],
        index=[age],
    )

    final_df = final_df.append(temp_df, sort=True)

Sorry, I mislabeled it. The reference would be the Month (Age) in the Data Section. I'll edit the post — Russ W, Jun 14 '19 at 19:05
is excel cumulative sum the same as pandas/python? this is where I always fall flat, the same functions are different, for example python's round uses the bankers round (rounds down from 0.5) where as excel rounds up. Caused me some hiccups when re-writing VBA code in python! The basis for your question is good but I'm finding it hard to follow from your example to your output — Umar.H, Jun 15 '19 at 19:31

score 2 · Accepted Answer · edited Jun 17 '19 at 14:15

2

You use a complex conditions depending on variables. It is easy to find a vectorized way for simple cumulative sums, but I cannot imagine a nice way for the Cumulative NCO.

So I would revert to Python comprehensions:

data = [
    { 'Reference Age': ref,
      'Outstanding Balance': df.loc[df.iloc[:,6]>=ref,'Balance'].sum(),
      'Cumulative NCO': df.loc[(df.iloc[:,6]>=ref)&(df.iloc[:,7]<=ref),
                   'NCO'].sum() }
    for ref in [85, 90, 95, 100]]

result = pd.DataFrame(data).set_index('Reference Age').T

It produces:

Reference Age          85     90    95    100
Cumulative NCO          25     60    40    25
Outstanding Balance  16500  13000  6500  1000

edited Jun 17 '19 at 14:15

Russ W

99
4

answered Jun 14 '19 at 17:09

Serge Ballesta

143,923
11
122
252

This looks like it gets the balance right, but the NCO is wrong. It took the total of all NCO for column 85, despite only 1 NCO occurring within that time period. – Russ W Jun 14 '19 at 22:03
@RussW As I am not an expert in Excel formulas and as my Excel has French name for formulas I had not understood how cumulative MCO was computed. This answer is wrong and will be deleted unless I find a correct way. – Serge Ballesta Jun 15 '19 at 18:41
@RussW: This is less optimized but meet the requirements from your `SUMIFS`. – Serge Ballesta Jun 15 '19 at 18:52
Thanks! This looks like exactly what I need. – Russ W Jun 17 '19 at 14:15

score 0 · Answer 2 · answered Jun 14 '19 at 17:07

0

You could try to build groups of loans in a given age range using pd.cut and use groupby afterwards. Something like this:

import pandas as pd

df = pd.DataFrame([[1, 2, 3, 4, 5], [7, 8, 9, 10, 11]], index=['age', 'value']).T
df['groups'] = pd.cut(df.age, [0, 1, 3, 5]) # define bins (0,1], (1,3], (3,5]
df.groupby('groups')['value'].sum()

answered Jun 14 '19 at 17:07

chuni0r

173
4

The problem is a groupby is just simply a sum, but as we move forward in time we will need to exclude certain values. In the example you can see that balance and cumulative NCO decline at the later stages. If we made $100k in loans but they haven't all aged to 90 months, we will understate our loss ratio at 90 months if we include them all because all the loans have not had a chance to go bad at month 90 yet. Thanks for the answer :) – Russ W Jun 14 '19 at 19:13

score 0 · Answer 3 · answered Jun 14 '19 at 17:10

Not sure that I completly follow the exact logic you are going for, but you can accomplish a sumifs with the combination of pandas query and groupby.

Example

import pandas as pd
import numpy as np

age = np.random.randint(85, 100, 50)
balance = np.random.randint(1000, 2500, 50)
nco = np.random.randint(85, 100, 50)

df = pd.DataFrame({'age': age, 'balance': balance, 'nco':nco})


df['reference_age'] = df['age'].apply(lambda x: 5 * round(float(x)/5))

outstanding_balance = (
   df
   .query('age >= reference_age')
   .groupby('reference_age')
   [['balance']]
   .sum()
   .rename(columns={'balance': 'Outstanding Balance'}
   )

cumulative_nco = (
   df
   .query('age < reference_age')
   .groupby('reference_age')
   [['nco']]
   .sum()
   .rename(columns={'nco': 'cumulative nco'})
   .cumsum()
   )


result = outstanding_balance.join(cumulative_sum).T

result

reference_age            85       90       95
Outstanding Balance  2423.0  16350.0  13348.0
cumulative nco          NaN    645.0   1107.0

Conditional Cumulative Sums in Pandas

3 Answers3

Example

result