2

I have a dataframe with several columns. I want to order by city and keep, for 'city' == 'Buenos Aires', a certain number of rows. And for 'city' == 'Paris', another number of rows. Which is the best way to do it? Here is shown a way to keep the same number of rows for each group. But I want a different number of rows.

    city            number
0   Buenos Aires    14
1   Paris           23
2   Barcelona       12
3   Buenos Aires    14
4   Buenos Aires    14
... ...             ...
  • Does this answer your question? [Pandas get topmost n records within each group](https://stackoverflow.com/questions/20069009/pandas-get-topmost-n-records-within-each-group) – ljmc Jan 02 '23 at 23:56

1 Answers1

3

Use groupby.apply with a dictionary of the number of values to keep:

d = {'Buenos Aires': 2, 'Paris': 3}

out = df.groupby('city').apply(lambda g: g.head(d.get(g.name, 0)))

NB. for random rows, use sample in place of head.

Alternative with groupby.cumcount:

d = {'Buenos Aires': 2, 'Paris': 3}

out = (df[df['city'].map(d).lt(df.groupby('city').cumcount())]
       .sort_values(by='city')
      )
mozway
  • 194,879
  • 13
  • 39
  • 75