0

I need to create a dataframe filtering out the five most frequently listed countries in the Nationality column and the total amount of times they are listed. I've been trying to use groupby, but have been unsuccessful. The code i've used it

df.groupby(['Nationality']).sum() 

I also need to determine what percent of those listed as participating in the program have at least one referral. I'm not sure the code for this either though.

This is part of the dataframe

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
  • So you want to remove rows that contain a nationality that is in the top 5 most frequently listed nationalities? And you also want to count how many rows there are where the nationality is in the top 5? – rangeseeker Oct 21 '21 at 23:50
  • I want to create a new dataframe, showing the top five most frequently listed nationalities, and how many times they are listed. – estridge2014 Oct 22 '21 at 05:24
  • I will update my answer momentarily. – rangeseeker Oct 22 '21 at 19:22

1 Answers1

0

Filter out rows which contain Nationality that is in top 5 nationalities:

df[df['Nationality'].isin(df['Nationality'].value_counts().index[:6]) == False]

See how many times they're listed by looking at shape of df where rows contain Nationality that is in top 5:

df[df['Nationality'].isin(df['Nationality'].value_counts().index[:6])].shape

Quick way to see what percent of Number_of_Referalls has value > or = to 1:

(df['Number_of_Referalls '] >= 1).value_counts(normalize=True) * 100
rangeseeker
  • 395
  • 2
  • 9