get median of a columns based on the weights from another column

Question

I have a data frame like this,

col1     col2
 100      3
 200      2
 300      4
 400      1

Now I want to have median on col1 in such way col2 values will be the weights for each col1 values like this,

median of [100, 100, 100, 200, 200, 300, 300, 300, 300, 400] # 100 is 3 times as the weight is 3

I can do it by creating multiple rows based on weights but I can't allow more rows, is there any way to do it more efficiently without creating multiple rows either in python or pyspark

score 1 · Answer 1 · answered Aug 08 '23 at 06:33

1

Repeat the values then calculate median

df.loc[df.index.repeat(df['col2']), 'col1'].median()

250.0

answered Aug 08 '23 at 06:33

Shubham Sharma

68,127
6
24
53

get median of a columns based on the weights from another column

1 Answers1