I want to be able to calculate median on group of columns in a dataframe. I have the following example dataframe. For my actual dataframe, number of columns, names of columns and groups are dynamic as it depends on the user input.
raw_data= {'a':['g1','g2','g3','g4','g5'],'b':[10,11,12,13,14],'c':[5,6,7,8,9],'d':[112,1,0,9,8],'e':[6,7,8,0,9],'f':[0,1,6,7,8],'g':[9,8,6,5,4]}
mydf= pd.DataFrame(raw_data)
newdf= mydf.set_index('a')
I have another dictionary where I store group information for my columns
gp_dict= {'gp1':['b','c','d'],'gp2':['e','f','g']}
Current dataframe:
b c d e f g
a
g1 10 5 112 6 0 9
g2 11 6 1 7 1 8
g3 12 7 0 8 6 6
g4 13 8 9 0 7 5
g5 14 9 8 9 8 4
My code should get median for each group and keep rows if Median greater than 8 for any of the groups.
Desired output:
b c d e f g
g1 10 5 112 6 0 9
g4 13 8 9 0 7 5
g5 14 9 8 9 8 4
Obviously, intermediate step should be to calcuate median on both groups:
b c d e f g Median_gp1 Median_gp2
g1 10 5 112 6 0 9 10 6
g2 11 6 1 7 1 8 6 7
g3 12 7 0 8 6 6 7 6
g4 13 8 9 0 7 5 9 5
g5 14 9 8 9 8 4 9 8