calcualte numpy IRR with inconsistent frequencies across pandas data frame

Question

My questions is not around how to calculate IRR (internal rate of return) but rather given a data set similar to the below, how to best calculate IRR without waiting months for the result when the sample size dramatically increases.

I am using the np.irr function

Example of Data

import pandas as pd
import numpy as np

date_list =['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04','2018-01-05', '2018-01-06', '2018-01-07', '2018-01-14','2018-01-21', '2018-01-31','2018-02-08', '2018-02-28']
ids_list = [1,1,1,1,2,2,2,2,3,3,3,3]
flows_list = [ -10, 2, 2, 10, -50, 25, 20, 20, -100, 0,  3, 150]
df = pd.DataFrame(list(zip(date_list,ids_list,flows_list)), columns=['Date','ID','Flow'])
df['Date'] = pd.to_datetime(df['Date'],format='%Y-%m-%d')

Yields the Below Dataset

In [144]: df
Out[144]:
         Date  ID  Flow
0  2018-01-01   1   -10
1  2018-01-02   1     2
2  2018-01-03   1     2
3  2018-01-04   1    10
4  2018-01-05   2   -50
5  2018-01-06   2    25
6  2018-01-07   2    20
7  2018-01-14   2    20
8  2018-01-21   3  -100
9  2018-01-31   3     0
10 2018-02-08   3     3
11 2018-02-28   3   150

Data Explanation

Date is the day the cash flowed in or out.
ID is essentially the unique id of each investment made.
Flow is the cash flow of that ID (investment).
I need use a daily frequency as my input for np.irr

IF I do a simple pandas.groupby

In [145]: df.groupby(['ID'])['Flow'].agg(np.irr)
Out[145]:
ID
1    0.141962
2    0.150155
3    0.153450
Name: Flow, dtype: float64

So for ID 1 the np.irr returned makes sense as my frequency is consistent.

However, for the rest you'll see the dates are not equally spaced by day.

Example of 'Manually' calculating np.irr for ID 3

df.loc[df.ID ==3]['Date'].apply(lambda x: (x - min(df.loc[df.ID ==3]['Date'])).days)

8      0
9     10
10    18
11    38
Name: Date, dtype: int64

Can see above that each cashflow occurs at the beginning, on the 10th, 18th & Finally on the 38th day.

cfs = np.zeros(39)
cfs[[0,10,18,38]] = df.loc[df.ID ==3]['Flow'].values

np.irr(cfs)

This yields the actual np.irr for ID 3:

Out[155]: 0.011386397119650837

So my question is:

How to calculate np.irr across a pandas.DataFrame where the cashflow frequencies are inconsistent in an optimal manner ?

score 1 · Accepted Answer · answered Nov 06 '18 at 10:38

This seems to be the most optimal and accurate way I could find. Avoiding for loops!!

Load Example Data

import pandas as pd
import numpy as np

date_list =['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04','2018-01-05', '2018-01-06', '2018-01-07', '2018-01-14','2018-01-21', '2018-01-31','2018-02-08', '2018-02-28']
ids_list = [1,1,1,1,2,2,2,2,3,3,3,3]
flows_list = [ -10, 2, 2, 10, -50, 25, 20, 20, -100, 0,  3, 150]
df = pd.DataFrame(list(zip(date_list,ids_list,flows_list)), columns=['Date','ID','Flow'])
df['Date'] = pd.to_datetime(df['Date'],format='%Y-%m-%d')

Re-index data frame to include 0's cashflows

def reindex_by_date_and_fill(df,groupby_column='ID',value_column='Flow'):
    dates = pd.date_range(df.index.min(), df.index.max())
    return pd.concat([df.reindex(dates)[groupby_column].ffill(),df.reindex(dates,fill_value=0)[value_column]],axis=1)

df_test = df.set_index(['Date']).groupby(['ID'],as_index=False).apply(reindex_by_date_and_fill).reset_index(0,drop=True)

I got this idea from the below two posts:

Add missing dates to pandas dataframe

Pandas reindex dates in Groupby

Basically it fills in the days which are missing and populates the cash flow as a zero. This allows you to get a daily frequency across each investment while still maintaining the period in which the cash flows were paid back.

In [54]: df_test.head(10)
Out[54]:
             ID  Flow
2018-01-01  1.0   -10
2018-01-02  1.0     2
2018-01-03  1.0     2
2018-01-04  1.0    10
2018-01-05  2.0   -50
2018-01-06  2.0    25
2018-01-07  2.0    20
2018-01-08  2.0     0
2018-01-09  2.0     0
2018-01-10  2.0     0

This Allows you then to use groupby

In [60]: df_test.groupby(['ID'])['Flow'].agg(np.irr)
Out[60]:
ID
1.0    0.141962
2.0    0.082212
3.0    0.011386
Name: Flow, dtype: float64

calcualte numpy IRR with inconsistent frequencies across pandas data frame

Example of Data

So my question is:

1 Answers1