1

I have the following detaframe bb:

bq_selection_id bq_balance  bq_market_id  bq_back_price
0         45094462     185.04       7278437           1.97
1         45094462     185.04       7278437           1.97
2         45094463     185.04       7278437           3.05
3         45094463     185.04       7278437           3.05
4         45094464     185.04       7278437           5.80
5         45094464     185.04       7278437           5.80
6         45094466     185.04       7278437         200.00
7         45094466     185.04       7278437         200.00
8         45094465     185.04       7278437            NaN
9         45094465     185.04       7278437            NaN

I would like to group by "market_id" and take first two lowest "bq_back_price". I managed to do this with

bb.groupby('bq_market_id')['bq_back_price'].nsmallest(2)

The problem is that I am missing some of the columns such as "bq_selection_id", "bq_balance" and column "bq_back_price" does not have name. That is what I get

bq_market_id   
7278437       0    1.97
7278437       1    1.97

And I would like to get something like this

bq_selection_id bq_balance  bq_market_id  bq_back_price
0         45094462     185.04       7278437           1.97
1         45094462     185.04       7278437           1.97

Can you help me please?

Arthur Zangiev
  • 1,438
  • 3
  • 14
  • 21

3 Answers3

2

You can first sort values on bq_back_price, and then take head(2) in each groups.

In [218]: df.sort_values('bq_back_price').groupby('bq_market_id').head(2)
Out[218]:
   bq_selection_id  bq_balance  bq_market_id  bq_back_price
0         45094462      185.04       7278437           1.97
1         45094462      185.04       7278437           1.97
Zero
  • 74,117
  • 18
  • 147
  • 154
1

How about appending a new "Rank" column?

bb['rank'] = bb.groupby(['bq_market_id'])['bq_back_price'].rank(ascending=True)

After that, you can filter bb for the lowest 2 prices (rank 1 and 2).

bb[bb['rank'] < 3]

Credit to: python pandas rank by column

Community
  • 1
  • 1
wij
  • 1,304
  • 1
  • 8
  • 9
  • 1
    You might want to add `method='first' to ensure that no more than 2 rows in each group have rank < 3. – unutbu Mar 28 '16 at 18:38
0

You can use merge by indexes:

print bb.groupby('bq_market_id')['bq_back_price'].nsmallest(2).reset_index(level=0, name='bq_back_price')
   bq_market_id  bq_back_price
0       7278437           1.97
1       7278437           1.97

print pd.merge(bb[['bq_selection_id','bq_balance']],
               bb.groupby('bq_market_id')['bq_back_price'].nsmallest(2).reset_index(level=0, name='bq_back_price'),
               left_index=True,
               right_index=True)
   bq_selection_id  bq_balance  bq_market_id  bq_back_price
0         45094462      185.04       7278437           1.97
1         45094462      185.04       7278437           1.97

unutbu deleted nice answer, but I think it is better as my answer:

result = df.groupby('bq_market_id')['bq_back_price'].nsmallest(2)
idx = result.index.get_level_values(-1)
print(df.loc[idx])

   bq_selection_id  bq_balance  bq_market_id  bq_back_price
0         45094462      185.04       7278437           1.97
1         45094462      185.04       7278437           1.97
Community
  • 1
  • 1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252