My questions is not around how to calculate IRR (internal rate of return) but rather given a data set similar to the below, how to best calculate IRR without waiting months for the result when the sample size dramatically increases.
I am using the np.irr
function
Example of Data
import pandas as pd
import numpy as np
date_list =['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04','2018-01-05', '2018-01-06', '2018-01-07', '2018-01-14','2018-01-21', '2018-01-31','2018-02-08', '2018-02-28']
ids_list = [1,1,1,1,2,2,2,2,3,3,3,3]
flows_list = [ -10, 2, 2, 10, -50, 25, 20, 20, -100, 0, 3, 150]
df = pd.DataFrame(list(zip(date_list,ids_list,flows_list)), columns=['Date','ID','Flow'])
df['Date'] = pd.to_datetime(df['Date'],format='%Y-%m-%d')
Yields the Below Dataset
In [144]: df
Out[144]:
Date ID Flow
0 2018-01-01 1 -10
1 2018-01-02 1 2
2 2018-01-03 1 2
3 2018-01-04 1 10
4 2018-01-05 2 -50
5 2018-01-06 2 25
6 2018-01-07 2 20
7 2018-01-14 2 20
8 2018-01-21 3 -100
9 2018-01-31 3 0
10 2018-02-08 3 3
11 2018-02-28 3 150
Data Explanation
Date
is the day the cash flowed in or out.ID
is essentially the unique id of each investment made.Flow
is the cash flow of thatID
(investment).I need use a daily frequency as my input for
np.irr
IF I do a simple pandas.groupby
In [145]: df.groupby(['ID'])['Flow'].agg(np.irr)
Out[145]:
ID
1 0.141962
2 0.150155
3 0.153450
Name: Flow, dtype: float64
So for ID
1 the np.irr
returned makes sense as my frequency is consistent.
However, for the rest you'll see the dates are not equally spaced by day.
Example of 'Manually' calculating np.irr
for ID 3
df.loc[df.ID ==3]['Date'].apply(lambda x: (x - min(df.loc[df.ID ==3]['Date'])).days)
8 0
9 10
10 18
11 38
Name: Date, dtype: int64
Can see above that each cashflow occurs at the beginning, on the 10th, 18th & Finally on the 38th day.
cfs = np.zeros(39)
cfs[[0,10,18,38]] = df.loc[df.ID ==3]['Flow'].values
np.irr(cfs)
This yields the actual np.irr
for ID
3:
Out[155]: 0.011386397119650837
So my question is:
How to calculate
np.irr
across apandas.DataFrame
where the cashflow frequencies are inconsistent in an optimal manner ?