0

So I have a DataFrame with two columns, one with label names (df['Labels']) and the other with int values (df['Volume']).

df = pd.DataFrame({'Labels': 
['A','A','A','A','B','B','B','B','B','B','A','A','A','A','A','A','A','A','C','C','C','C','C'],
                   'Volume':[10,40,20,20,50,60,40,50,50,60,10,10,10,10,20,20,10,20,80,90,90,80,100]})

I would like to identify intervals where my labels change and then calculate the median on the column 'Volume' for each of these intervals. Later I should replace every value of column 'Volume' by the respective median of each interval.

In case of label A, I would like to have the median for both intervals.

Here is how my DataFrame should looks like:

df2 = pd.DataFrame({'Labels':['A','A','A','A','B','B','B','B','B','B','A','A','A','A','A','A','A','A','C','C','C','C','C'],
                    'Volume':[20,20,20,20,50,50,50,50,50,50,10,10,10,10,10,10,10,10,90,90,90,90,90]})
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
user026
  • 638
  • 4
  • 14

2 Answers2

1

You want to groupby the blocks and transform median:

blocks = df['Labels'].ne(df['Labels'].shift()).cumsum()
df['group_median'] = df['Volume'].groupby(blocks).transform('median')
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
1

Use Series.cumsum + Series.shift() to create groups using groupby and then use transform

df['Volume']=df.groupby(df['Labels'].ne(df['Labels'].shift()).cumsum())['Volume'].transform('median')
print(df)

   Labels  Volume
0       A      20
1       A      20
2       A      20
3       A      20
4       B      50
5       B      50
6       B      50
7       B      50
8       B      50
9       B      50
10      A      10
11      A      10
12      A      10
13      A      10
14      A      10
15      A      10
16      A      10
17      A      10
18      C      90
19      C      90
20      C      90
21      C      90
22      C      90
ansev
  • 30,322
  • 5
  • 17
  • 31