0

Recently, I try to find an efficient way to do accumulated sum on a series.

>>> df=pd.DataFrame()
>>> df['a']=[1,3,1,4,2,5,3,8]
>>> df
       a
    0  1
    1  3
    2  1
    3  4
    4  2
    5  5
    6  3
    7  8

The expected output :

df
       a  b
    0  1  1
    1  3  4
    2  1  5
    3  4  9
    4  2  11
    5  5  16
    6  3  19
    7  8  27

Each b[i] equals sum(a[j] for j<=i)

I deal with the problem by

df['b']=df.a
for i in range(df.shape[0]-1):
    df.b.ix[i+1]+=df.b.ix[i] if df.b.ix[i+1] else df.b.ix[i]

It's not concise enough, I want to take off the loop. Here I come for advice.

MaNKuR
  • 2,578
  • 1
  • 19
  • 31
Garvey
  • 1,197
  • 3
  • 13
  • 26

1 Answers1

1
df['b'] = df.a.cumsum()

Reference: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.cumsum.html

John Zwinck
  • 239,568
  • 38
  • 324
  • 436