Numpy
solution - you can use broadcasting for avoid loops for improve performance:
df = pd.DataFrame({"age" : [0, 5, 10, 15, 20], "income": [5, 13, 23, 18, 12]})
interest = 1.03
age = df['age'].to_numpy()
Use power with subtracted values of mask:
arr = interest ** (age[:, None] - age )
print (arr)
[[1. 0.86260878 0.74409391 0.64186195 0.55367575]
[1.15927407 1. 0.86260878 0.74409391 0.64186195]
[1.34391638 1.15927407 1. 0.86260878 0.74409391]
[1.55796742 1.34391638 1.15927407 1. 0.86260878]
[1.80611123 1.55796742 1.34391638 1.15927407 1. ]]
Then set 0 to upper triangle:
arr = np.where(np.triu(np.ones(arr.shape, dtype=bool)), 0, arr)
print (arr)
[[0. 0. 0. 0. 0. ]
[1.15927407 0. 0. 0. 0. ]
[1.34391638 1.15927407 0. 0. 0. ]
[1.55796742 1.34391638 1.15927407 0. 0. ]
[1.80611123 1.55796742 1.34391638 1.15927407 0. ]]
Set 1 to diagonal:
np.fill_diagonal(arr, 1)
print (arr)
[[1. 0. 0. 0. 0. ]
[1.15927407 1. 0. 0. 0. ]
[1.34391638 1.15927407 1. 0. 0. ]
[1.55796742 1.34391638 1.15927407 1. 0. ]
[1.80611123 1.55796742 1.34391638 1.15927407 1. ]]
Multiple by column income and sum per rows:
print (arr * df['income'].to_numpy())
[[ 5. 0. 0. 0. 0. ]
[ 5.79637037 13. 0. 0. 0. ]
[ 6.7195819 15.07056297 23. 0. 0. ]
[ 7.78983708 17.47091293 26.66330371 18. 0. ]
[ 9.03055617 20.25357642 30.91007672 20.86693334 12. ]]
df['new'] = (arr * df['income'].to_numpy()).sum(axis=1)
print (df) age income new
0 0 5 5.000000
1 5 13 18.796370
2 10 23 44.790145
3 15 18 69.924054
4 20 12 93.061143
Performance: For 5k rows, apply
are loops under the hood, so slow (best avoid it)
df = pd.DataFrame({"age" : [0, 5, 10, 15, 20], "income": [5, 13, 23, 18, 12]})
df = pd.concat([df] * 1000, ignore_index=True)
In [292]: %%timeit
...: age = df['age'].to_numpy()
...:
...: arr = interest ** (age[:, None] - age )
...: arr = np.where(np.triu(np.ones(arr.shape, dtype=bool)), 0, arr)
...: np.fill_diagonal(arr, 1)
...: df['new'] = (arr * df['income'].to_numpy()).sum(axis=1)
...:
1.39 s ± 69.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [293]: %%timeit
...: df['sumproduct'] = (df['age'].expanding().apply(lambda x: sum(df.loc[:x.index[-1], 'income'] * interest**(x.iloc[-1]-x))))
...:
5.13 s ± 411 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)