1

I have a csv data file that I want to process with python pandas. I want to firstly group the data based on their id. Then I want to delete the 4 smallest number in each group. Is there a way to do this?

So far I've tried:

if args.process_log:
    datafile = pd.read_csv(args.out_log) 
    grouped_data = datafile.groupby('id')
    datagroup_smallest = grouped_data.apply(lambda x: x.nsmallest(n=4, columns='width'))
    unwanted_index = datagroup_smallest.index.get_level_values(1)

Then I want to delete the unwanted rows with their index. I've tried:

data_splitted = grouped_data.filter(unwanted_index)

I got:

TypeError: 'Int64Index' object is not callable
petezurich
  • 9,280
  • 9
  • 43
  • 57

1 Answers1

1

Consider DataFrame.groupby.nth:

data_splitted = grouped_data.nth[:-4]
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Thanks for the advice but the 4 smallest numbers are not in order, which means they might not be the first 4 elements in the group. – Maria Sabrina Ma Jul 09 '22 at 05:02
  • Then, run `sort_values` on id and column(s) of interest before `groupby`. Without [reproducible example](https://stackoverflow.com/q/20109391/1422451), I cannot know what such columns are. – Parfait Jul 09 '22 at 15:13