0

So I have a list of generator generation in excel:

Gen1    Gen2    Gen3    Gen4
 12      88      32      13
 22      21      71      11
 22     354      32      74
 43      21      65      11
 33      21      44      11
 55      22      32      43  
 44      21      23      11
 12      21      32      77
 76      33      67      11
 123     55      32      67
 32      21      32      14
 49      76      78      11
 54      21      32      11
 33      21      32      44
 22      44      998     43
 77      21      32      32
 09      34      55      11
 98      21      32      14
 33      21      44      16
 21      23      22     341

There are 20 values in each column and I wan to reduce to 5 values per column, which will be done by taking the average of 4 values at a time. So it will be like taking the average of 1-4 elements first then 5-8, 9-12, 13-16 and last 17-20 elements. The expected output will look like this:

Gen1    Gen2    Gen3    Gen4
24.75   121      50     27.25
 36      21.25   32.75  35.5
 70      46.25   52.25  25.75 
 46.5    26.75   273.5  32.5
 40.25   24.75   38.25  95.5

I know how to do this in python and my approach is this which gives the exact output that I want:

import numpy as np
gen1 = np.array(df['Gen1'])
gen2 = np.array(df['Gen2'])
gen3 = np.array(df['Gen3'])
gen4 = np.array(df['Gen4'])

solutionArr = np.zeros((5,4))
gen1List = []
gen2List = []
gen3List = []
gen4List = []

for i in range(5):
    gen1List.append(np.mean(gen1[i*4:(i+1)*4]))
    gen2List.append(np.mean(gen2[i*4:(i+1)*4]))
    gen3List.append(np.mean(gen3[i*4:(i+1)*4]))
    gen4List.append(np.mean(gen4[i*4:(i+1)*4]))

solutionArr[:,0] = gen1List
solutionArr[:,1] = gen2List
solutionArr[:,2] = gen3List
solutionArr[:,3] = gen4List
print(solutionArr)

But I want to use pandas for this because in real dataset I have more 140 Generators with very different names(not like gen1, gen2 etc). So can anyone please help with this?

Vesper
  • 795
  • 1
  • 9
  • 21

1 Answers1

3

You can do Floor division on index and then groupby and mean:

df.groupby(df.index//4).mean()

If index is not a range index do:

out = df.reset_index()
out.drop("index",1).groupby(out.index//4).mean()

Or,

df.groupby(np.arange(len(df)) // 4).mean()

    Gen1    Gen2    Gen3   Gen4
0  24.75  121.00   50.00  27.25
1  36.00   21.25   32.75  35.50
2  70.00   46.25   52.25  25.75
3  46.50   26.75  273.50  32.50
4  40.25   24.75   38.25  95.50

Check out this answer: Calculate average of every x rows in a table and create new table

Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53
anky
  • 74,114
  • 11
  • 41
  • 70