avoid looping in pandas dataframe - python

Question

I have a pandas dataframe in python.

I need to iterate over each column and calculate a value and based in this value I have to calculate the next row value.

Right now I am doing it using iterrows():

value = 1000
df['calculated_column'] = 0

for index, row in df.iterrows():
    
    df.loc[index,'calculated_column'] = (df.loc[index -1 ,'calculated_column'] - df.loc[index,'column_to_sum']) if index != 0 else value

So, it is going to be something like this:

row 1 => df['calculated_column'] = 1000
row 2 => df['calculated_column'] = 1000 + df['column_to_sum'] = 1100
row 3 => df['calculated_column'] = 1100 + df['column_to_sum'] = 1200

I have read that do iterrows to iterate over a pandas dataframe should be avoided: How to iterate over rows in a DataFrame in Pandas

How could I do this process without iterrows? I have tried doing it with the apply function but I don't know how to use it

Daniel R · Accepted Answer · 2020-07-14T12:17:08.890

1

You can use cumsum:

df = pd.DataFrame({'x': [20, 30, 50, 50, 35]})
df['y'] = 1000 + df['x'].cumsum()
print(df)

    x     y
0  20  1020
1  30  1050
2  50  1100
3  50  1150
4  35  1185

edited Jul 14 '20 at 12:17

answered Jul 14 '20 at 11:12

Daniel R

1,954
1
14
21

row 1 => df['calculated_column'] = 1000 row 2 => df['calculated_column'] = 1000 + df['column_to_sum'] = 1100 row 3 => df['calculated_column'] = 1100 + df['column_to_sum'] = 1200 .... – J.C Guzman Jul 14 '20 at 12:04
I see, now it is clear what you want. You can use `cumsum` for that. Edited my answer. – Daniel R Jul 14 '20 at 12:17

avoid looping in pandas dataframe - python

1 Answers1