Pandas - How to downsample non time series pandas column

Question

Given:

pd.DataFrame({'ranges': [2,4,6,2,4,1,2,5,3,2,3,5,6,3,2,1,4,6,3,2]})

I would like to have a new DataFrame where the 20 values (actually 1000 in my real case) get reduced to 4 values, each being the average (or other function) of the corresponding group of 5, so in other words:

average of (2,4,6 and 2), average of (4,1,2,5) etc.

It's like downsampling and it's related to binning. I am stumped. I bet it becomes a one-liner.

"average of (2,4,6 and 2), average of (4,1,2,5) etc" will produce 5 values from 20, right? — , Feb 01 '22 at 00:53

score 2 · Answer 1 · answered Feb 01 '22 at 00:47

2

Try this:

>>> df.groupby(df.index // 4)['ranges'].mean()
0    3.50
1    3.00
2    3.25
3    3.00
4    3.75
Name: ranges, dtype: float64

answered Feb 01 '22 at 00:47

score 1 · Accepted Answer · answered Feb 01 '22 at 00:57

You can floor divide the index and groupby it to find the mean of each group:

out = df.groupby(df.index//5)['ranges'].mean()

Output:

0    3.6
1    2.6
2    3.8
3    3.2
Name: ranges, dtype: float64

If the number of rows is divisible by the size of each group, we can use numpy:

out = df['ranges'].to_numpy().reshape(-1,5).mean(axis=1)

Output:

array([3.6, 2.6, 3.8, 3.2])

Pandas - How to downsample non time series pandas column

2 Answers2