How do I perform a summation of `n` rows at a time in pandas?

Question

Given a data frame

This data frame has 20 columns. I would like to group n=5 rows at a time and sum them up. So, my output would look like this:

df.rolling_sum will not help because it does not allow you to vary the stride when summing.

What other ways are there to do this?

Before blindly downvoting, PLEASE READ: [duplicates can serve as a signpost, directing users to useful answers on another question.](https://stackoverflow.blog/2009/05/20/linking-duplicate-questions/) There is nothing wrong with asking a question that was marked duplicate. — cs95, Aug 05 '17 at 19:55
Agreed. Also, this isn't an exact duplicate. It's close enough that I won't undo the dup flag. But yours asked about sum. The other asked about average. Different people will think to search for different things. If your question is phrased in a way that ultimately guides more people to the answer they need, then you have contributed a useful question, even if it is a duplicate. — piRSquared, Aug 05 '17 at 20:40
@piRSquared I say the question is definitely a duplicate as the questions do not really ask how to take the `sum` or the `mean` but in general how to group (to aggregate) these observations. But I agree duplicates are useful. I might have even closed the other one as a duplicate of this one if it weren't were for Divakar's answer because the phrase "n rows at a time" seems more common. Anyway, I wouldn't read so much into the downvotes as everybody here has their own standards it is not possible to keep up with all of them. — ayhan, Aug 06 '17 at 18:31

score 5 · Accepted Answer · answered Aug 05 '17 at 19:19

5

df.set_index(df.index // 5).sum(level=0)

answered Aug 05 '17 at 19:19

piRSquared

285,575
57
475
624

Oh, this is new. Didn't know about levels... cool... – cs95 Aug 05 '17 at 19:22

cs95 · Answer 2 · 2019-01-12T03:17:12.097

Assuming your indices are contiguous, you can perform integer division on df.index, and then group by index.

For the df above, you can do this:

df.index // 5
# Int64Index([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3], dtype='int64')

Getting the final answer is just one more step, using df.groupby and dfGroupBy.sum:

df.groupby(df.index // 5).sum()

     A
0  250
1  347
2  266
3  328

If you do not have a RangeIndex, use df.reset_index first and then group.

score 2 · Answer 3 · answered Aug 05 '17 at 19:21

If you can manage an ndarray with the sums as opposed to a Series (you could always construct a Series again anyhow), you could use np.add.reduceat.

np.add.reduceat(df.A.values, np.arange(0, df.A.size, 5))

Which in this case returns

array([250, 347, 266, 328])

How do I perform a summation of `n` rows at a time in pandas?

3 Answers3