How to get the median of different intervals of dataframe based on label name?

Question

So I have a DataFrame with two columns, one with label names (df['Labels']) and the other with int values (df['Volume']).

df = pd.DataFrame({'Labels': 
['A','A','A','A','B','B','B','B','B','B','A','A','A','A','A','A','A','A','C','C','C','C','C'],
                   'Volume':[10,40,20,20,50,60,40,50,50,60,10,10,10,10,20,20,10,20,80,90,90,80,100]})

I would like to identify intervals where my labels change and then calculate the median on the column 'Volume' for each of these intervals. Later I should replace every value of column 'Volume' by the respective median of each interval.

In case of label A, I would like to have the median for both intervals.

Here is how my DataFrame should looks like:

df2 = pd.DataFrame({'Labels':['A','A','A','A','B','B','B','B','B','B','A','A','A','A','A','A','A','A','C','C','C','C','C'],
                    'Volume':[20,20,20,20,50,50,50,50,50,50,10,10,10,10,10,10,10,10,90,90,90,90,90]})

`df.groupby(df.Labels.ne(df.Labels.shift()).cumsum()).transform('median')` — user3483203, Oct 17 '19 at 15:08

score 1 · Accepted Answer · answered Oct 17 '19 at 15:07

1

You want to groupby the blocks and transform median:

blocks = df['Labels'].ne(df['Labels'].shift()).cumsum()
df['group_median'] = df['Volume'].groupby(blocks).transform('median')

answered Oct 17 '19 at 15:07

Quang Hoang

146,074
10
56
74

score 1 · Answer 2 · answered Oct 17 '19 at 15:08

Use Series.cumsum + Series.shift() to create groups using groupby and then use transform

df['Volume']=df.groupby(df['Labels'].ne(df['Labels'].shift()).cumsum())['Volume'].transform('median')
print(df)

   Labels  Volume
0       A      20
1       A      20
2       A      20
3       A      20
4       B      50
5       B      50
6       B      50
7       B      50
8       B      50
9       B      50
10      A      10
11      A      10
12      A      10
13      A      10
14      A      10
15      A      10
16      A      10
17      A      10
18      C      90
19      C      90
20      C      90
21      C      90
22      C      90

How to get the median of different intervals of dataframe based on label name?

2 Answers2