How to remove duplicates based on lower frequency

Question

I have a table that looks like this

I want to be able to keep ids for brands that have highest freq. For example in case of audi both ids have same frequencies so keep only one. In case of mercedes-benz keep the latter one since it has frequency 7.

This is my dataframe:

{'Brand': 
 {0: 'audi',
  1: 'audi',
  2: 'bmw',
  3: 'dacia',
  4: 'fiat',
  5: 'ford',
  6: 'ford',
  7: 'honda',
  8: 'honda',
  9: 'hyundai',
  10: 'kia',
  11: 'mercedes-benz',
  12: 'mercedes-benz',
  13: 'nissan',
  14: 'nissan',
  15: 'opel',
  16: 'renault',
  17: 'renault',
  18: 'renault',
  19: 'renault',
  20: 'toyota',
  21: 'toyota',
  22: 'volvo',
  23: 'vw',
  24: 'vw',
  25: 'vw',
  26: 'vw'},
 'id': 
 {0: 'audi_a4_dynamic_2016_otomatik',
  1: 'audi_a6_standart_2015_otomatik',
  2: 'bmw_5 series_executive_2016_otomatik',
  3: 'dacia_duster_laureate_2017_manuel',
  4: 'fiat_egea_easy_2017_manuel',
  5: 'ford_focus_trend x_2015_manuel',
  6: 'ford_focus_trend x_2015_otomatik',
  7: 'honda_civic_eco elegance_2017_otomatik',
  8: 'honda_cr-v_executive_2018_otomatik',
  9: 'hyundai_tucson_elite plus_2017_otomatik',
  10: 'kia_sportage_concept plus_2015_otomatik',
  11: 'mercedes-benz_c-class_amg_2016_otomatik',
  12: 'mercedes-benz_e-class_edition e_2015_otomatik',
  13: 'nissan_qashqai_black edition_2014_manuel',
  14: 'nissan_qashqai_sky pack_2015_otomatik',
  15: 'opel_astra_edition_2016_manuel',
  16: 'renault_clio_joy_2016_manuel',
  17: 'renault_kadjar_icon_2015_otomatik',
  18: 'renault_kadjar_icon_2016_otomatik',
  19: 'renault_mégane_touch_2017_otomatik',
  20: 'toyota_corolla_touch_2015_otomatik',
  21: 'toyota_corolla_touch_2016_otomatik',
  22: 'volvo_s60_advance_2018_otomatik',
  23: 'vw_jetta_comfortline_2013_otomatik',
  24: 'vw_passat_highline_2017_otomatik',
  25: 'vw_tiguan_sport&style_2012_manuel',
  26: 'vw_tiguan_sport&style_2013_manuel'},
 'freq': {0: 4,
  1: 4,
  2: 7,
  3: 4,
  4: 4,
  5: 4,
  6: 4,
  7: 4,
  8: 4,
  9: 4,
  10: 4,
  11: 4,
  12: 7,
  13: 4,
  14: 4,
  15: 4,
  16: 4,
  17: 4,
  18: 4,
  19: 4,
  20: 4,
  21: 4,
  22: 4,
  23: 4,
  24: 7,
  25: 4,
  26: 4}}

Edit: tried one of the answers and got an extra level of header

this may be duplicate with https://stackoverflow.com/questions/25071937/filter-pandas-dataframe-based-on-max-values-in-a-column — Josh Purtell, Oct 14 '20 at 18:53
It's certainly a dup, but I think there's a better one. In this case I'd just sort the drop_duplicates: `df.sort_values('freq').drop_duplicates('Brand', keep='last')` — ALollz, Oct 14 '20 at 18:55
@ALollz don't know why I didnt think of it. The easiest solution. Thanks — Mubashir Raza, Oct 14 '20 at 18:59
The dup i posted has that solution in there somewhere, it winds up being different depending upon whether you want a single value or all values tied for the max, but the `drop_dup` ensures only one. — ALollz, Oct 14 '20 at 19:10

score 0 · Answer 1 · answered Oct 14 '20 at 18:53

0

You need to pandas.groupby Brand and then aggregate with respect to the maximal frequency.

Something like this should work:

df.groupby('Brand')[['id', 'freq']].agg({'freq': 'max'})

answered Oct 14 '20 at 18:53

rjg

569
6
18

One small issue with this one. please check the question again I have edited it – Mubashir Raza Oct 14 '20 at 18:58

Valdi_Bo · Answer 2 · 2020-10-14T19:11:11.967

0

To get your result, run:

result = df.groupby('Brand', as_index=False).apply(
    lambda grp: grp[grp.freq == grp.freq.max()].iloc[0])

edited Oct 14 '20 at 19:11

answered Oct 14 '20 at 19:04

Valdi_Bo

30,023
4
23
41

How to remove duplicates based on lower frequency

2 Answers2