keep a certain number of rows from a specific group in pandas

Question

I have a dataframe with several columns. I want to order by city and keep, for 'city' == 'Buenos Aires', a certain number of rows. And for 'city' == 'Paris', another number of rows. Which is the best way to do it? Here is shown a way to keep the same number of rows for each group. But I want a different number of rows.

    city            number
0   Buenos Aires    14
1   Paris           23
2   Barcelona       12
3   Buenos Aires    14
4   Buenos Aires    14
... ...             ...

Does this answer your question? [Pandas get topmost n records within each group](https://stackoverflow.com/questions/20069009/pandas-get-topmost-n-records-within-each-group) — ljmc, Jan 02 '23 at 23:56

mozway · Accepted Answer · 2023-01-03T00:02:42.000

3

Use groupby.apply with a dictionary of the number of values to keep:

d = {'Buenos Aires': 2, 'Paris': 3}

out = df.groupby('city').apply(lambda g: g.head(d.get(g.name, 0)))

NB. for random rows, use sample in place of head.

Alternative with groupby.cumcount:

d = {'Buenos Aires': 2, 'Paris': 3}

out = (df[df['city'].map(d).lt(df.groupby('city').cumcount())]
       .sort_values(by='city')
      )

edited Jan 03 '23 at 00:02

answered Jan 02 '23 at 23:57

mozway

194,879
13
39
75

keep a certain number of rows from a specific group in pandas

1 Answers1