Python pandas dataframe groupby selecting columns

Question

I have the following detaframe bb:

bq_selection_id bq_balance  bq_market_id  bq_back_price
0         45094462     185.04       7278437           1.97
1         45094462     185.04       7278437           1.97
2         45094463     185.04       7278437           3.05
3         45094463     185.04       7278437           3.05
4         45094464     185.04       7278437           5.80
5         45094464     185.04       7278437           5.80
6         45094466     185.04       7278437         200.00
7         45094466     185.04       7278437         200.00
8         45094465     185.04       7278437            NaN
9         45094465     185.04       7278437            NaN

I would like to group by "market_id" and take first two lowest "bq_back_price". I managed to do this with

bb.groupby('bq_market_id')['bq_back_price'].nsmallest(2)

The problem is that I am missing some of the columns such as "bq_selection_id", "bq_balance" and column "bq_back_price" does not have name. That is what I get

bq_market_id   
7278437       0    1.97
7278437       1    1.97

And I would like to get something like this

bq_selection_id bq_balance  bq_market_id  bq_back_price
0         45094462     185.04       7278437           1.97
1         45094462     185.04       7278437           1.97

Can you help me please?

score 2 · Answer 1 · answered Mar 28 '16 at 17:34

You can first sort values on bq_back_price, and then take head(2) in each groups.

In [218]: df.sort_values('bq_back_price').groupby('bq_market_id').head(2)
Out[218]:
   bq_selection_id  bq_balance  bq_market_id  bq_back_price
0         45094462      185.04       7278437           1.97
1         45094462      185.04       7278437           1.97

score 1 · Answer 2 · edited May 23 '17 at 12:31

1

How about appending a new "Rank" column?

bb['rank'] = bb.groupby(['bq_market_id'])['bq_back_price'].rank(ascending=True)

After that, you can filter bb for the lowest 2 prices (rank 1 and 2).

bb[bb['rank'] < 3]

Credit to: python pandas rank by column

edited May 23 '17 at 12:31

Community

1
1

answered Mar 28 '16 at 17:46

wij

1,304
1
8
9

1

You might want to add `method='first' to ensure that no more than 2 rows in each group have rank < 3. – unutbu Mar 28 '16 at 18:38

score 0 · Answer 3 · edited May 23 '17 at 12:15

You can use merge by indexes:

print bb.groupby('bq_market_id')['bq_back_price'].nsmallest(2).reset_index(level=0, name='bq_back_price')
   bq_market_id  bq_back_price
0       7278437           1.97
1       7278437           1.97

print pd.merge(bb[['bq_selection_id','bq_balance']],
               bb.groupby('bq_market_id')['bq_back_price'].nsmallest(2).reset_index(level=0, name='bq_back_price'),
               left_index=True,
               right_index=True)
   bq_selection_id  bq_balance  bq_market_id  bq_back_price
0         45094462      185.04       7278437           1.97
1         45094462      185.04       7278437           1.97

unutbu deleted nice answer, but I think it is better as my answer:

result = df.groupby('bq_market_id')['bq_back_price'].nsmallest(2)
idx = result.index.get_level_values(-1)
print(df.loc[idx])

   bq_selection_id  bq_balance  bq_market_id  bq_back_price
0         45094462      185.04       7278437           1.97
1         45094462      185.04       7278437           1.97

Python pandas dataframe groupby selecting columns

3 Answers3

Linked