1

I was wondering how to add a column "total" to a pandas dataframe, containing the sum of all rows beyond that particular row. To clarify what I mean: I already have "index" and "number", and I want to create "total":

index   number    total
0       5         18 (5+7+3+1+2)
1       7         13 (7+3+1+2)
2       3         6 (3+1+2)
3       1         3 (1+2)
4       2         2

I have no idea where to start, some help would be greatly appreciated!

Thanks a lot in advance.

yatu
  • 86,083
  • 12
  • 84
  • 139
Minte
  • 41
  • 4
  • 1
    Does this answer your question? [Perform a reverse cumulative sum on a numpy array](https://stackoverflow.com/questions/16541618/perform-a-reverse-cumulative-sum-on-a-numpy-array) – sushanth Jul 10 '20 at 08:42

2 Answers2

3

Here's a numpy based one using np.triu to build an upper triangular matrix from number, and then adding along the second axis (note: for larger arrays, the cumsum approach in the dupe np.cumsum(a[::-1])[::-1] is much more efficient)

a = df.number.to_numpy()
df['total'] = np.triu(a).sum(1)

print(df)

   index  number  total
0      0       5     18
1      1       7     13
2      2       3      6
3      3       1      3
4      4       2      2
yatu
  • 86,083
  • 12
  • 84
  • 139
0

One quick solution is using pandas.Series.cumsum on the rows that are order-inverted with [::-1]

df['total'] = df['number'].iloc[::-1].cumsum()[::-1]

Output

#    index  number  total
# 0      0       5     18
# 1      1       7     13
# 2      2       3      6
# 3      3       1      3
# 4      4       2      2
Ric S
  • 9,073
  • 3
  • 25
  • 51