73

I have a DataFrame like this:

df:

 fruit    val1 val2
0 orange    15    3
1 apple     10   13
2 mango     5    5 

How do I get Pandas to give me a cumulative sum and percentage column on only val1?

Desired output:

df_with_cumsum:

 fruit    val1 val2   cum_sum    cum_perc
0 orange    15    3    15          50.00
1 apple     10   13    25          83.33
2 mango     5    5     30          100.00

I tried df.cumsum(), but it's giving me this error:

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
ComputerFellow
  • 11,710
  • 12
  • 50
  • 61

3 Answers3

127
df['cum_sum'] = df['val1'].cumsum()
df['cum_perc'] = 100*df['cum_sum']/df['val1'].sum()

This will add the columns to df. If you want a copy, copy df first and then do these operations on the copy.

Sayan Sil
  • 5,799
  • 3
  • 17
  • 32
BrenBarn
  • 242,874
  • 37
  • 412
  • 384
7

It's a good answer, but written in 2014. I just modified a little bit, so it can pass the compiler and results looks similar to the example.

df['cum_sum'] = df["val1"].cumsum()
df['cum_perc'] = round(100*df.cum_sum/df["val1"].sum(),2)
vvvvv
  • 25,404
  • 19
  • 49
  • 81
Gene
  • 81
  • 1
  • 2
0

The above answer is good, but out of date. I have updated it so that it works.

df['cum_sum'] = df['val1'].cumsum()

df['cum_perc'] = round((df.cum_sum/df['val1'].sum())*100,2)

ASi
  • 21
  • 8