1

Let me explain, I have a dataframe like this.

I have a dataframe that contains name & country. A specific name might have more than one country. So I want to keep the name & country that occurs the most.

Input DF:

name    country
John       UK
John       USA
John       USA
Maria      India
Maria      India
Maria      UAE
Tim        Australia

Expected output:

name    country
John       USA
John       USA
Maria      India
Maria      India
Tim        Australia
rahul sharma
  • 109
  • 1
  • 8
  • In your expected output Australia is there, while UK is not. What is the reason? – Andrea Feb 13 '20 at 16:13
  • `df[df['country'].eq(df.groupby('name')['country'].transform(lambda x: x.mode().iat[0]))]` basically take the mode and then compare with `country` column and keep only those which matches – anky Feb 13 '20 at 16:13
  • Are you supposed to keep the duplicates in the case where there is the same name/country combo ? e.g is it expected to have two rows of "John, USA" ? – LoicM Feb 13 '20 at 16:18
  • Sorry, I also want to keep non-duplicat evalues @Andrea...Basically remove duplicate values with less country frequency – rahul sharma Feb 13 '20 at 16:19
  • Yeah, it is supposed to have two rows of John USA. – rahul sharma Feb 13 '20 at 16:19
  • or also `df[df['country'].eq(df.groupby('name')['country'].transform(lambda x: x.value_counts().index[0]))]` either of both works – anky Feb 13 '20 at 16:20
  • If it contain only one row of John/Maria, that is also fine, i guess. – rahul sharma Feb 13 '20 at 16:20

0 Answers0