0

I have a table that looks like this enter image description here

I want to be able to keep ids for brands that have highest freq. For example in case of audi both ids have same frequencies so keep only one. In case of mercedes-benz keep the latter one since it has frequency 7.

This is my dataframe:

{'Brand': 
 {0: 'audi',
  1: 'audi',
  2: 'bmw',
  3: 'dacia',
  4: 'fiat',
  5: 'ford',
  6: 'ford',
  7: 'honda',
  8: 'honda',
  9: 'hyundai',
  10: 'kia',
  11: 'mercedes-benz',
  12: 'mercedes-benz',
  13: 'nissan',
  14: 'nissan',
  15: 'opel',
  16: 'renault',
  17: 'renault',
  18: 'renault',
  19: 'renault',
  20: 'toyota',
  21: 'toyota',
  22: 'volvo',
  23: 'vw',
  24: 'vw',
  25: 'vw',
  26: 'vw'},
 'id': 
 {0: 'audi_a4_dynamic_2016_otomatik',
  1: 'audi_a6_standart_2015_otomatik',
  2: 'bmw_5 series_executive_2016_otomatik',
  3: 'dacia_duster_laureate_2017_manuel',
  4: 'fiat_egea_easy_2017_manuel',
  5: 'ford_focus_trend x_2015_manuel',
  6: 'ford_focus_trend x_2015_otomatik',
  7: 'honda_civic_eco elegance_2017_otomatik',
  8: 'honda_cr-v_executive_2018_otomatik',
  9: 'hyundai_tucson_elite plus_2017_otomatik',
  10: 'kia_sportage_concept plus_2015_otomatik',
  11: 'mercedes-benz_c-class_amg_2016_otomatik',
  12: 'mercedes-benz_e-class_edition e_2015_otomatik',
  13: 'nissan_qashqai_black edition_2014_manuel',
  14: 'nissan_qashqai_sky pack_2015_otomatik',
  15: 'opel_astra_edition_2016_manuel',
  16: 'renault_clio_joy_2016_manuel',
  17: 'renault_kadjar_icon_2015_otomatik',
  18: 'renault_kadjar_icon_2016_otomatik',
  19: 'renault_mégane_touch_2017_otomatik',
  20: 'toyota_corolla_touch_2015_otomatik',
  21: 'toyota_corolla_touch_2016_otomatik',
  22: 'volvo_s60_advance_2018_otomatik',
  23: 'vw_jetta_comfortline_2013_otomatik',
  24: 'vw_passat_highline_2017_otomatik',
  25: 'vw_tiguan_sport&style_2012_manuel',
  26: 'vw_tiguan_sport&style_2013_manuel'},
 'freq': {0: 4,
  1: 4,
  2: 7,
  3: 4,
  4: 4,
  5: 4,
  6: 4,
  7: 4,
  8: 4,
  9: 4,
  10: 4,
  11: 4,
  12: 7,
  13: 4,
  14: 4,
  15: 4,
  16: 4,
  17: 4,
  18: 4,
  19: 4,
  20: 4,
  21: 4,
  22: 4,
  23: 4,
  24: 7,
  25: 4,
  26: 4}}

Edit: tried one of the answers and got an extra level of header enter image description here

Mubashir Raza
  • 143
  • 1
  • 6
  • this may be duplicate with https://stackoverflow.com/questions/25071937/filter-pandas-dataframe-based-on-max-values-in-a-column – Josh Purtell Oct 14 '20 at 18:53
  • 1
    It's certainly a dup, but I think there's a better one. In this case I'd just sort the drop_duplicates: `df.sort_values('freq').drop_duplicates('Brand', keep='last')` – ALollz Oct 14 '20 at 18:55
  • @ALollz don't know why I didnt think of it. The easiest solution. Thanks – Mubashir Raza Oct 14 '20 at 18:59
  • The dup i posted has that solution in there somewhere, it winds up being different depending upon whether you want a single value or all values tied for the max, but the `drop_dup` ensures only one. – ALollz Oct 14 '20 at 19:10

2 Answers2

0

You need to pandas.groupby Brand and then aggregate with respect to the maximal frequency.

Something like this should work:

df.groupby('Brand')[['id', 'freq']].agg({'freq': 'max'})
rjg
  • 569
  • 6
  • 18
0

To get your result, run:

result = df.groupby('Brand', as_index=False).apply(
    lambda grp: grp[grp.freq == grp.freq.max()].iloc[0])
Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41