1

I have a pandas dataframe in python.

I need to iterate over each column and calculate a value and based in this value I have to calculate the next row value.

Right now I am doing it using iterrows():

value = 1000
df['calculated_column'] = 0

for index, row in df.iterrows():
    
    df.loc[index,'calculated_column'] = (df.loc[index -1 ,'calculated_column'] - df.loc[index,'column_to_sum']) if index != 0 else value 

So, it is going to be something like this:

row 1 => df['calculated_column'] = 1000
row 2 => df['calculated_column'] = 1000 + df['column_to_sum'] = 1100
row 3 => df['calculated_column'] = 1100 + df['column_to_sum'] = 1200

I have read that do iterrows to iterate over a pandas dataframe should be avoided: How to iterate over rows in a DataFrame in Pandas

How could I do this process without iterrows? I have tried doing it with the apply function but I don't know how to use it

J.C Guzman
  • 1,192
  • 3
  • 16
  • 40

1 Answers1

1

You can use cumsum:

df = pd.DataFrame({'x': [20, 30, 50, 50, 35]})
df['y'] = 1000 + df['x'].cumsum()
print(df)

    x     y
0  20  1020
1  30  1050
2  50  1100
3  50  1150
4  35  1185
Daniel R
  • 1,954
  • 1
  • 14
  • 21
  • row 1 => df['calculated_column'] = 1000 row 2 => df['calculated_column'] = 1000 + df['column_to_sum'] = 1100 row 3 => df['calculated_column'] = 1100 + df['column_to_sum'] = 1200 .... – J.C Guzman Jul 14 '20 at 12:04
  • I see, now it is clear what you want. You can use `cumsum` for that. Edited my answer. – Daniel R Jul 14 '20 at 12:17