0

Is there a way to vectorize this? Currently I an looping through the dataframe as column values depend on previous values of other columns.

enter image description here

Column F depends on previous version of G

Column G also depends current value of F which as stated previously depends on previous value of G

Column G also depends current value of F which as stated previously depends on previous value of G

import pandas as pd

data = [[1,2131000,2131000,77.13],
[1,269000,2400000,79.25],
[1,1340000,3740000,81],
[1,268000,4008000,83.75],
[-1,1073000,2935000,85],
[1,269000,3204000,75]]

df = pd.DataFrame(data,columns=['Flag','Q','Cumsum Q','P'])

df['P*Q'] = df['P']*df['Q']
df.loc[0,'Adj P*Q'] = df.loc[0, 'P*Q']
df.loc[0,'Adj P'] = df.loc[0, 'P']

for index, row in df.iloc[1:,].iterrows():
        df.loc[index,'Adj P*Q'] = np.where(df.loc[index,'Flag'] == 1, df.loc[index-1,'Adj P*Q'] + df.loc[index,'P*Q'] * df.loc[index,'Flag'], df.loc[index-1,'Adj P'] * df.loc[index,'Cumsum Q'])
        df.loc[index,'Adj P'] = np.where(df.loc[index,'Flag'] == 1, df.loc[index,'Adj P*Q'] / df.loc[index,'Cumsum Q'],  df.loc[index-1,'Adj P'])
dingo
  • 317
  • 1
  • 11
  • Write a function and use `df.apply`. – MattDMo Sep 04 '22 at 19:56
  • 2
    Can you please provide the code that you're currently using, or some pseudo-code to demonstrate what you're trying to do? I can't understand excel, and I am sure I'm not the only person here who can't. – Adam Oppenheimer Sep 04 '22 at 19:56
  • Look into [Series.shift](https://pandas.pydata.org/docs/reference/api/pandas.Series.shift.html) to run needed arithmetic on columns without looping. – Parfait Sep 04 '22 at 20:11
  • Yes that can probably be vectorized, but as noted above one would need a fully reproducible example to give it a try... – mozway Sep 04 '22 at 20:31
  • fully reproducible example added. Basically the 2 columns in the loop depend on previous values to be computed – dingo Sep 04 '22 at 22:32

1 Answers1

1

Now that you've posted some example code, I think this is sufficiently complicated that if it's possible to vectorize, it would take some creativity to figure out how.

If the code you posted is too slow for your data, I would recommend looking into something like numba to speed it up.

Also, I doubt using .apply would do anything - please see here (which also mentions using numba as an option to speed up loops).

Finally, I would recommend cleaning up your code since the code you posted is relatively difficult to follow. This code gives the same results as yours on the sample data, but is easier to understand:

for idx, row in df.iloc[1:,].iterrows():
    if row['Flag'] == 1:
        df.loc[idx, 'Adj P*Q'] = df.loc[idx - 1, 'Adj P*Q'] + df.loc[idx, 'P*Q']
        df.loc[idx, 'Adj P'] = df.loc[idx, 'Adj P*Q'] / df.loc[idx, 'Cumsum Q']
    else:
        df.loc[idx, 'Adj P*Q'] = df.loc[idx - 1, 'Adj P'] * df.loc[idx, 'Cumsum Q']
        df.loc[idx, 'Adj P'] = df.loc[idx - 1, 'Adj P']