How do I delete specific rows from each group of a dataframe?

Question

I have a csv data file that I want to process with python pandas. I want to firstly group the data based on their id. Then I want to delete the 4 smallest number in each group. Is there a way to do this?

So far I've tried:

if args.process_log:
    datafile = pd.read_csv(args.out_log) 
    grouped_data = datafile.groupby('id')
    datagroup_smallest = grouped_data.apply(lambda x: x.nsmallest(n=4, columns='width'))
    unwanted_index = datagroup_smallest.index.get_level_values(1)

Then I want to delete the unwanted rows with their index. I've tried:

data_splitted = grouped_data.filter(unwanted_index)

I got:

TypeError: 'Int64Index' object is not callable

The word you're looking for is `drop`, not `filter`. – BeRT2me Jul 09 '22 at 04:00 — BeRT2me, Jul 09 '22 at 04:00

score 1 · Answer 1 · answered Jul 09 '22 at 04:08

1

Consider DataFrame.groupby.nth:

data_splitted = grouped_data.nth[:-4]

answered Jul 09 '22 at 04:08

Parfait

104,375
17
94
125

Thanks for the advice but the 4 smallest numbers are not in order, which means they might not be the first 4 elements in the group. – Maria Sabrina Ma Jul 09 '22 at 05:02
Then, run `sort_values` on id and column(s) of interest before `groupby`. Without [reproducible example](https://stackoverflow.com/q/20109391/1422451), I cannot know what such columns are. – Parfait Jul 09 '22 at 15:13

How do I delete specific rows from each group of a dataframe?

1 Answers1