0

I have a dataframe:

df = pd.DataFrame({0: [1, 2, 5, 13], 1: [1, 4, 3, 1], 2: [1, 2, 5, 10], 3: [4, 4, 5, 3], 4: [0, 6, 1, 1], 5: [5, 6, 4, 5], 6: [0, 9, 9, 0], 7: [1, 1, 1, 1]})

df

0   1   2   3   4   5   6   7
1   1   1   4   0   5   0   1
2   4   2   4   6   6   9   1
5   3   5   5   1   4   9   1
13  1   10  3   1   5   0   1

I want to take the average values of each 2 side-by-side elements but sliced every 4 columns (average1 = columns 0, 1, 2, 3, average2 = columns 1, 2, 3, 4, average3 = columns 2, 3, 4, 5 ....etc).

For example pseudo code would be:

for index in range(len(df.columns)):
  df_1 = df.iloc[:index, index:index+1]
  df_2 = df.iloc[:index, index+2:index+3]
  df_avg = pd.concat([df_1, df_2]).mean(axis=1)

The output I desire is:

df_avg

(1+1+1+4)/4    (1+1+4+0)/4  . . . .  (0+5+0+1)/4
(4+2+4+6)/4    (4+2+4+6)/4  . . . .  (6+6+9+1)/4
.
.
.
(13+1+10+3)/4  (1+10+3+1)/4  . . . .  (1+5+0+1)/4



df_avg

1.75   1.50 . . . . 1.50
4.00   4.00 . . . . 5.50
6.75   3.75 . . . . 1.75

Is there an easy way to do this with groupby().mean() or possibly .rolling().mean() methods?

HelloToEarth
  • 2,027
  • 3
  • 22
  • 48
  • Is that what your actual data looks like? If so, Pandas might not be the right data structure for this. – AMC Feb 10 '20 at 23:13

1 Answers1

1

IIUC rolling with mean

df_ave=df.rolling(4,axis=1).mean().dropna(1)
df_ave
Out[100]: 
      3     4     5     6     7
0  1.75  1.50  2.50  2.25  1.50
1  3.00  4.00  4.50  6.25  5.50
2  4.50  3.50  3.75  4.75  3.75
3  6.75  3.75  4.75  2.25  1.75

First split

df1, df2 = [y for _, y in df.groupby(df.columns//4,axis=1)]
(df1.rolling(2,axis=1).mean().dropna(1).values + df2.rolling(2,axis=1).mean().dropna(1).values)/2
Out[112]: 
array([[1.75, 1.75, 1.5 ],
       [4.5 , 5.25, 4.  ],
       [3.25, 5.25, 5.  ],
       [5.  , 4.  , 3.5 ]])
BENY
  • 317,841
  • 20
  • 164
  • 234
  • This is off the track of the original post but what if I wanted columns 0, 1, 10, 11 and then 1, 2, 11, 12....etc --- so for every 10-away columns rather than side-by-side columns? – HelloToEarth Feb 10 '20 at 22:43
  • @HelloToEarth can you explain more about the 1,2,10,11 logic ? – BENY Feb 10 '20 at 22:45
  • @HelloToEarth this is more like selft_define step length , then we need for loop here, can not be down by rolling – BENY Feb 10 '20 at 22:47
  • So if we suppose the dataframe column number is divisible by 11 -- I'd like to simply take slices of 4 elements (2 elements starting from 0, 1 and the next 2 starting from 10, 11 all the way to the end of the dataframe); this is the exact same logic as above but instead of rolling every 4 side-by-side elements we get slices that are 10 columns away from one another. – HelloToEarth Feb 10 '20 at 22:48
  • @HelloToEarth split the data frame into two, then each of them do the rolling mean, and add up do the mean – BENY Feb 10 '20 at 22:51
  • @HelloToEarth check the update ? – BENY Feb 10 '20 at 22:56
  • Yes. I think there is an issue for the list comprehension using y for _, y in.... expects 1 output argument but tries to unpack 2. – HelloToEarth Feb 10 '20 at 23:05