I have a data frame like this,
col1 col2
100 3
200 2
300 4
400 1
Now I want to have median on col1 in such way col2 values will be the weights for each col1 values like this,
median of [100, 100, 100, 200, 200, 300, 300, 300, 300, 400] # 100 is 3 times as the weight is 3
I can do it by creating multiple rows based on weights but I can't allow more rows, is there any way to do it more efficiently without creating multiple rows either in python or pyspark