Averaging Slices of a Dataframe in Pandas

Question

I have a dataframe:

df = pd.DataFrame({0: [1, 2, 5, 13], 1: [1, 4, 3, 1], 2: [1, 2, 5, 10], 3: [4, 4, 5, 3], 4: [0, 6, 1, 1], 5: [5, 6, 4, 5], 6: [0, 9, 9, 0], 7: [1, 1, 1, 1]})

df

0   1   2   3   4   5   6   7
1   1   1   4   0   5   0   1
2   4   2   4   6   6   9   1
5   3   5   5   1   4   9   1
13  1   10  3   1   5   0   1

I want to take the average values of each 2 side-by-side elements but sliced every 4 columns (average1 = columns 0, 1, 2, 3, average2 = columns 1, 2, 3, 4, average3 = columns 2, 3, 4, 5 ....etc).

For example pseudo code would be:

for index in range(len(df.columns)):
  df_1 = df.iloc[:index, index:index+1]
  df_2 = df.iloc[:index, index+2:index+3]
  df_avg = pd.concat([df_1, df_2]).mean(axis=1)

The output I desire is:

df_avg

(1+1+1+4)/4    (1+1+4+0)/4  . . . .  (0+5+0+1)/4
(4+2+4+6)/4    (4+2+4+6)/4  . . . .  (6+6+9+1)/4
.
.
.
(13+1+10+3)/4  (1+10+3+1)/4  . . . .  (1+5+0+1)/4



df_avg

1.75   1.50 . . . . 1.50
4.00   4.00 . . . . 5.50
6.75   3.75 . . . . 1.75

Is there an easy way to do this with groupby().mean() or possibly .rolling().mean() methods?

Is that what your actual data looks like? If so, Pandas might not be the right data structure for this. — AMC, Feb 10 '20 at 23:13

BENY · Accepted Answer · 2020-02-10T22:56:03.803

1

IIUC rolling with mean

df_ave=df.rolling(4,axis=1).mean().dropna(1)
df_ave
Out[100]: 
      3     4     5     6     7
0  1.75  1.50  2.50  2.25  1.50
1  3.00  4.00  4.50  6.25  5.50
2  4.50  3.50  3.75  4.75  3.75
3  6.75  3.75  4.75  2.25  1.75

First split

df1, df2 = [y for _, y in df.groupby(df.columns//4,axis=1)]
(df1.rolling(2,axis=1).mean().dropna(1).values + df2.rolling(2,axis=1).mean().dropna(1).values)/2
Out[112]: 
array([[1.75, 1.75, 1.5 ],
       [4.5 , 5.25, 4.  ],
       [3.25, 5.25, 5.  ],
       [5.  , 4.  , 3.5 ]])

edited Feb 10 '20 at 22:56

answered Feb 10 '20 at 22:40

BENY

317,841
20
164
234

This is off the track of the original post but what if I wanted columns 0, 1, 10, 11 and then 1, 2, 11, 12....etc --- so for every 10-away columns rather than side-by-side columns? – HelloToEarth Feb 10 '20 at 22:43
@HelloToEarth can you explain more about the 1,2,10,11 logic ? – BENY Feb 10 '20 at 22:45
@HelloToEarth this is more like selft_define step length , then we need for loop here, can not be down by rolling – BENY Feb 10 '20 at 22:47
So if we suppose the dataframe column number is divisible by 11 -- I'd like to simply take slices of 4 elements (2 elements starting from 0, 1 and the next 2 starting from 10, 11 all the way to the end of the dataframe); this is the exact same logic as above but instead of rolling every 4 side-by-side elements we get slices that are 10 columns away from one another. – HelloToEarth Feb 10 '20 at 22:48
@HelloToEarth split the data frame into two, then each of them do the rolling mean, and add up do the mean – BENY Feb 10 '20 at 22:51
@HelloToEarth check the update ? – BENY Feb 10 '20 at 22:56
Yes. I think there is an issue for the list comprehension using y for _, y in.... expects 1 output argument but tries to unpack 2. – HelloToEarth Feb 10 '20 at 23:05

Averaging Slices of a Dataframe in Pandas

1 Answers1