I have the following dataframe:
data = {'month': {0: Timestamp('2019-01-01 00:00:00'),
1: Timestamp('2019-02-01 00:00:00'),
2: Timestamp('2019-03-01 00:00:00'),
3: Timestamp('2019-04-01 00:00:00'),
4: Timestamp('2019-05-01 00:00:00')},
'base_expenses': {0: 200.0, 1: 200.0, 2: 200.0, 3: 200.0, 4: 200.0},
'base_contribution': {0: 100.0, 1: 100.0, 2: 100.0, 3: 100.0, 4: 100.0}}
df = pd.DataFrame(data)
df
month base_expenses base_contribution
0 2019-01-01 200.0 100.0
1 2019-02-01 200.0 100.0
2 2019-03-01 200.0 100.0
3 2019-04-01 200.0 100.0
4 2019-05-01 200.0 100.0
This data will represent an investment which has additional contributions added to it every month and which grows by a certain percentage every month.
So for example the starting balance of the investment is 50000
. Every month we add base_contribution
to the balance. Lastly, every month the balance grows by 0.6%
.
I can calculate all this using a loop as follows:
CURRENT_BALANCE = 50000
MONTHLY_INVESTMENT_RETURN = 0.006
df['base_balance'] = CURRENT_BALANCE
for index, row in df.iterrows():
if index == 0:
balance = row['base_contribution'] + row['base_balance']
balance += balance * MONTHLY_INVESTMENT_RETURN
df.loc[row.name, 'base_balance'] = balance
else:
balance = row['base_contribution'] + df.loc[row.name - 1, 'base_balance']
balance += balance * MONTHLY_INVESTMENT_RETURN
df.loc[row.name, 'base_balance'] = balance
The result would be:
month base_expenses base_contribution base_balance
0 2019-01-01 200.0 100.0 50422.344909
1 2019-02-01 200.0 100.0 50847.407197
2 2019-03-01 200.0 100.0 51275.204349
3 2019-04-01 200.0 100.0 51705.753960
4 2019-05-01 200.0 100.0 52139.073741
The real data that I am working with is very large and so I would prefer to avoid this looping approach if possible. Is there a way to do this in a vectorised manor or without looping?