2

I have a dataframe in the shape of [100, 50000] and I want to reduce it by applying mean per row in chunks of 5. (So I will get a dataframe at the shape of [100, 10000]). For example, So, if the row is

[1,8,-1,0,2 , 6,8,11,4,6]

the output will be

[2,7]

What is the most efficient way to do so?

Thanks

Cranjis
  • 1,590
  • 8
  • 31
  • 64
  • Does this answer your question? [Calculate average of every x rows in a table and create new table](https://stackoverflow.com/questions/36810595/calculate-average-of-every-x-rows-in-a-table-and-create-new-table) – Celius Stingher Feb 05 '20 at 13:55

1 Answers1

3

If shape is 100, 50000 means 100 rows and 50000 columns, solution is GroupBy.mean with helper np.arange created by lengths of columns and axis=1:

df = pd.DataFrame([[1,8,-1,0,2 , 6,8,11,4,6],
                   [1,8,-1,0,2 , 6,8,11,4,6]])
print (df)
   0  1  2  3  4  5  6   7  8  9
0  1  8 -1  0  2  6  8  11  4  6
1  1  8 -1  0  2  6  8  11  4  6

print (df.shape)
(2, 10)

df = df.groupby(np.arange(len(df.columns)) // 5, axis=1).mean()
print (df)
   0  1
0  2  7
1  2  7

If shape is 100, 50000 means 100 columns and 50000 rows, solution is GroupBy.mean with helper np.arange created by lengths of DataFrame:

df = pd.DataFrame({'a': [1,8,-1,0,2 , 6,8,11,4,6],
                   'b': [1,8,-1,0,2 , 6,8,11,4,6]})
print (df)
    a   b
0   1   1
1   8   8
2  -1  -1
3   0   0
4   2   2
5   6   6
6   8   8
7  11  11
8   4   4
9   6   6

print (df.shape)
(10, 2)

df = df.groupby(np.arange(len(df)) // 5).mean()
print (df)
   a  b
0  2  2
1  7  7
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252