How to keep duplicate values with most occurrence in another column

Asked Feb 13 '20 at 16:11

Active Feb 13 '20 at 16:24

Viewed 29 times

Let me explain, I have a dataframe like this.

I have a dataframe that contains name & country. A specific name might have more than one country. So I want to keep the name & country that occurs the most.

Input DF:

name    country
John       UK
John       USA
John       USA
Maria      India
Maria      India
Maria      UAE
Tim        Australia

Expected output:

name    country
John       USA
John       USA
Maria      India
Maria      India
Tim        Australia

asked Feb 13 '20 at 16:11

rahul sharma

In your expected output Australia is there, while UK is not. What is the reason? – Andrea Feb 13 '20 at 16:13
`df[df['country'].eq(df.groupby('name')['country'].transform(lambda x: x.mode().iat[0]))]` basically take the mode and then compare with `country` column and keep only those which matches – anky Feb 13 '20 at 16:13
Are you supposed to keep the duplicates in the case where there is the same name/country combo ? e.g is it expected to have two rows of "John, USA" ? – LoicM Feb 13 '20 at 16:18
Sorry, I also want to keep non-duplicat evalues @Andrea...Basically remove duplicate values with less country frequency – rahul sharma Feb 13 '20 at 16:19
Yeah, it is supposed to have two rows of John USA. – rahul sharma Feb 13 '20 at 16:19
or also `df[df['country'].eq(df.groupby('name')['country'].transform(lambda x: x.value_counts().index[0]))]` either of both works – anky Feb 13 '20 at 16:20
If it contain only one row of John/Maria, that is also fine, i guess. – rahul sharma Feb 13 '20 at 16:20

How to keep duplicate values with most occurrence in another column

0 Answers0