I have a dataframe - pastebin for minimium code to run
df_dict = {
'A': [1, 2, 3, 4, 5],
'B': [5, 2, 3, 1, 5],
'out': np.nan
}
df = pd.DataFrame(df_dict)
I am currently performing some row by row calculations by doing the following:
def transform(row):
length = 2
weight = 5
row_num = int(row.name)
out = row['A'] / length
if (row_num >= length):
previous_out = df.at[ row_num-1, 'out' ]
out = (row['B'] - previous_out) * weight + previous_out
df.at[row_num, 'out'] = out
df.apply( lambda x: transform(x), axis=1)
This yield the correct result:
A B out
0 1 5 0.5
1 2 2 1.0
2 3 3 11.0
3 4 1 -39.0
4 5 5 181.0
The breakdown for the correct calculation is as follows:
A B out
0 1 5 0.5
out = a / b
1 2 2 1.0
out = a / b
row_num >= length:
2 3 3 11.0
out = (b - previous_out) * weight + previous_out
out = (3 - 1) * 5 + 1 = 11
3 4 1 -39.0
out = (1 - 11) * 5 + 11 = 39
4 5 5 181.0
out = (5 - (-39)) * 5 + (-39) = 181
Executing this across many columns and looping is slow so I would like to optimize taking advantage of some kind of vectorization if possible.
My current attempt looks like this:
df['out'] = df['A'] / length
df[length:]['out'] = (df[length:]['B'] - df[length:]['out'].shift() ) * weight + df[length:]['out'].shift()
This is not working and I'm not quite sure where to go from here.
Pastebin of the above code to just copy/paste into a file and run