5

I'd like to find a general solution to groupby a DataFrame by a specified amount of rows or columns. Example DataFrame:

df = pd.DataFrame(0, index=['a', 'b', 'c', 'd', 'e', 'f'], columns=['c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7'])

   c1  c2  c3  c4  c5  c6  c7
a   0   0   0   0   0   0   0
b   0   0   0   0   0   0   0
c   0   0   0   0   0   0   0
d   0   0   0   0   0   0   0
e   0   0   0   0   0   0   0
f   0   0   0   0   0   0   0

For example I'd like to group by 2 rows a time and apply a function like mean or similar. I'd also like to know how to group by N columns a time and apply a function.

Group by 2 rows a time expected output:

   c1  c2  c3  c4  c5  c6  c7
0   0   0   0   0   0   0   0
1   0   0   0   0   0   0   0
2   0   0   0   0   0   0   0

Group by 2 columns a time expected output:

   0  1  2  3
a  0  0  0  0
b  0  0  0  0
c  0  0  0  0
d  0  0  0  0
e  0  0  0  0
f  0  0  0  0
luca
  • 7,178
  • 7
  • 41
  • 55

1 Answers1

11

This groups by N rows

>>> N=2

>>> df.groupby(np.arange(len(df.index))//N, axis=0).mean()
   c1  c2  c3  c4  c5  c6  c7
0   0   0   0   0   0   0   0
1   0   0   0   0   0   0   0
2   0   0   0   0   0   0   0

This groups by N columns

>>> df.groupby(np.arange(len(df.columns))//N, axis=1).mean()
   0  1  2  3
a  0  0  0  0
b  0  0  0  0
c  0  0  0  0
d  0  0  0  0
e  0  0  0  0
f  0  0  0  0
luca
  • 7,178
  • 7
  • 41
  • 55
  • 1
    After resetting columns, use `axis` in `df.groupby(by=lambda x: x/N, axis=1).mean()` for grouping by columns. – Zero Sep 28 '17 at 21:33
  • 1
    Luca, I said reset the columns `df.columns = range(0, len(df.columns))`? And, then apply `df.groupby(by=lambda x: x/N, axis=1).mean()`. – Zero Sep 28 '17 at 21:35
  • 1
    See https://stackoverflow.com/questions/36810595/calculate-average-of-every-x-rows-in-a-table-and-create-new-table `np.arange` would be valid for columns as well (you'd change it to `df.groupby(np.arange(len(df.columns))//N, axis=1)` – ayhan Sep 28 '17 at 21:42