4

Given a data frame

     A
0   14
1   59
2   38
3   40
4   99
5   89
6   70
7   64
8   84
9   40
10  30
11  94
12  65
13  29
14  48
15  26
16  80
17  79
18  74
19  69

This data frame has 20 columns. I would like to group n=5 rows at a time and sum them up. So, my output would look like this:

     A
0  250
1  347
2  266
3  328 

df.rolling_sum will not help because it does not allow you to vary the stride when summing.

What other ways are there to do this?

cs95
  • 379,657
  • 97
  • 704
  • 746
  • 1
    Before blindly downvoting, PLEASE READ: [duplicates can serve as a signpost, directing users to useful answers on another question.](https://stackoverflow.blog/2009/05/20/linking-duplicate-questions/) There is nothing wrong with asking a question that was marked duplicate. – cs95 Aug 05 '17 at 19:55
  • 1
    Agreed. Also, this isn't an exact duplicate. It's close enough that I won't undo the dup flag. But yours asked about sum. The other asked about average. Different people will think to search for different things. If your question is phrased in a way that ultimately guides more people to the answer they need, then you have contributed a useful question, even if it is a duplicate. – piRSquared Aug 05 '17 at 20:40
  • 1
    I also added my answer to that question – piRSquared Aug 05 '17 at 20:41
  • 1
    @piRSquared And I upvoted! Thanks for the support. – cs95 Aug 05 '17 at 20:47
  • 1
    @piRSquared I say the question is definitely a duplicate as the questions do not really ask how to take the `sum` or the `mean` but in general how to group (to aggregate) these observations. But I agree duplicates are useful. I might have even closed the other one as a duplicate of this one if it weren't were for Divakar's answer because the phrase "n rows at a time" seems more common. Anyway, I wouldn't read so much into the downvotes as everybody here has their own standards it is not possible to keep up with all of them. – ayhan Aug 06 '17 at 18:31

3 Answers3

5
df.set_index(df.index // 5).sum(level=0)
piRSquared
  • 285,575
  • 57
  • 475
  • 624
2

Assuming your indices are contiguous, you can perform integer division on df.index, and then group by index.

For the df above, you can do this:

df.index // 5
# Int64Index([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3], dtype='int64')

Getting the final answer is just one more step, using df.groupby and dfGroupBy.sum:

df.groupby(df.index // 5).sum()

     A
0  250
1  347
2  266
3  328

If you do not have a RangeIndex, use df.reset_index first and then group.

cs95
  • 379,657
  • 97
  • 704
  • 746
2

If you can manage an ndarray with the sums as opposed to a Series (you could always construct a Series again anyhow), you could use np.add.reduceat.

np.add.reduceat(df.A.values, np.arange(0, df.A.size, 5))

Which in this case returns

array([250, 347, 266, 328])
miradulo
  • 28,857
  • 6
  • 80
  • 93