I'm trying to calculate the column values of a pandas data frame "recursively".
Suppose there are data for two different days each having 10 observations and you want to calculate some variable r where only the first value of r is given (on each day) and you want to calculate the remaining 2*9 entries while every subsequent value depends on the previous entry of r and one additional 'contemporaneous' variable 'x'.
The first problem is that I want to perform the calculations for each day individually i.e. I'd like to use the pandas.groupby()
function for all my calculations... but when I try to subset the data and use the shift(1)
function, I only get "NaN" entries
data.groupby(data.index)['r'] = ( (1+data.groupby(data.index)['x']*0.25) * (1+data.groupby(data.index)['r'].shift(1)))
For my second approach, I used a for loop to iterate through the index (dates):
for i in range(2,21):
data[data['rank'] == i]['r'] = ( (1+data[data['rank'] == i]['x']*0.25) * (1+data[data['rank'] == i]['r'].shift(1))
but still, that doesn't work for me. Is there a way to perform such a calculation on DataFrames? Maybe something like rolling apply?
Data:
df = pd.DataFrame({
'rank' : [1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10],
'x' : [0.00275,0.00285,0.0031,0.0036,0.0043,0.0052,0.0063,0.00755,0.00895,0.0105,0.0027,0.00285,0.0031,0.00355,0.00425,0.0051,0.00615,0.00735,0.00875,0.0103],
'r' : [0.00158,'NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN',0.001485,'NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN']
},index=['2014-01-02', '2014-01-02', '2014-01-02', '2014-01-02',
'2014-01-02', '2014-01-02', '2014-01-02', '2014-01-02',
'2014-01-02', '2014-01-02', '2014-01-03', '2014-01-03',
'2014-01-03', '2014-01-03', '2014-01-03', '2014-01-03',
'2014-01-03', '2014-01-03', '2014-01-03', '2014-01-03'])