0

I would like to get the median value of one column and use the associated value of another column. For example,

   col1  col2 index
0     1     3     A
1     2     4     A
2     3     5     A
3     4     6     B
4     5     7     B
5     6     8     B
6     7     9     B

I group by the index to get the median value of col 1, and use the associated value of col 2 to get

   col1  col2 index
    2     4     A
    5     7     B

I can't use the actual median value for index B because it will average the two middle values and that value won't have a corresponding value in col 2. What's the best way to do this? Will a groupby method work? Or somehow use sort? Do I need to define my own function?

tkr
  • 11
  • 5
  • 1
    Seems like not even you know what you want? For `index` A it is pretty easy to do, just `df.groupby('index').median()` but what about for `index` B? Why did you choose `5` and not `6`? – rafaelc Jul 09 '18 at 20:11
  • When this case happen , why you did not pick up 6 8 for B – BENY Jul 09 '18 at 20:12
  • For B, either 5 or 6 is acceptable to me. But I can't use 5.5 because there isn't an associated value in the other column. – tkr Jul 09 '18 at 20:13
  • Using `df.groupby('index').median()` would yield both col 1 and col 2 as median values, instead of col 2 corresponding to the median value of col 1. – tkr Jul 09 '18 at 20:17

1 Answers1

0

Seems you need take middle position not median from origial df

df.groupby('index')[['col1','col2']].apply(lambda x : pd.Series(sorted(x.values.tolist())[len(x)//2]))
Out[297]: 
       0  1
index      
A      2  4
B      6  8
BENY
  • 317,841
  • 20
  • 164
  • 234
  • This works. I just have to shift back to a dataframe afterward and rename the columns. Can you explain how it only sorts by `col 1` when `[['col1','col2']]` are in the statement? – tkr Jul 09 '18 at 21:01