Group one column by another column in pandas?

Question

I would like to get the median value of one column and use the associated value of another column. For example,

   col1  col2 index
0     1     3     A
1     2     4     A
2     3     5     A
3     4     6     B
4     5     7     B
5     6     8     B
6     7     9     B

I group by the index to get the median value of col 1, and use the associated value of col 2 to get

   col1  col2 index
    2     4     A
    5     7     B

I can't use the actual median value for index B because it will average the two middle values and that value won't have a corresponding value in col 2. What's the best way to do this? Will a groupby method work? Or somehow use sort? Do I need to define my own function?

Seems like not even you know what you want? For `index` A it is pretty easy to do, just `df.groupby('index').median()` but what about for `index` B? Why did you choose `5` and not `6`? — rafaelc, Jul 09 '18 at 20:11
For B, either 5 or 6 is acceptable to me. But I can't use 5.5 because there isn't an associated value in the other column. — tkr, Jul 09 '18 at 20:13
Using `df.groupby('index').median()` would yield both col 1 and col 2 as median values, instead of col 2 corresponding to the median value of col 1. — tkr, Jul 09 '18 at 20:17

score 0 · Accepted Answer · answered Jul 09 '18 at 20:20

0

Seems you need take middle position not median from origial df

df.groupby('index')[['col1','col2']].apply(lambda x : pd.Series(sorted(x.values.tolist())[len(x)//2]))
Out[297]: 
       0  1
index      
A      2  4
B      6  8

answered Jul 09 '18 at 20:20

BENY

317,841
20
164
234

This works. I just have to shift back to a dataframe afterward and rename the columns. Can you explain how it only sorts by `col 1` when `[['col1','col2']]` are in the statement? – tkr Jul 09 '18 at 21:01

Group one column by another column in pandas?

1 Answers1