0

I have a dataframe with all Fifa 19 players. I've used group by to get the top 10 countries with the best players (best as in the highest overall mean), including only countries with more than 250 players in the Dataframe.

df[df.groupby('Nationality')['Overall'].transform('size') > 250].groupby(['Nationality'])['Overall'].mean().nlargest(10)

Now, I want to get the entire dataframe, all columns included, but only with these top 10 countries. How can I do this?

UPDATE:

Example created to better illustrate:

import pandas as pd
df = pd.DataFrame({'user': ['Bob', 'Jane', 'Alice','Rick'], 
               'income': [40000, 50000, 42000, 10000],
              'country':['Brazil','USA','Brazil','Canada']})

df[df.groupby('country')['income'].transform('size') > 1].groupby(['country'])['income'].mean().nlargest(2)

I would like to filter only brazil on this dataframe

dekio
  • 810
  • 3
  • 16
  • 33
  • 1
    why not create a sample example so we can replicate this? https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – anky Jun 19 '19 at 16:49
  • Sorry about that @anky_91. I thought it wasn't necessary. I will do this. – dekio Jun 19 '19 at 16:51
  • 2
    Never think that, think about the users who will look up to this post in near future. This won't help them. :) Hope you understand what I mean. Glad you are willing to create an example – anky Jun 19 '19 at 16:52

1 Answers1

1

You can use the values of country in your "top N" dataframe to subset the original dataframe.

import pandas as pd
df = pd.DataFrame({'user': ['Bob', 'Jane', 'Alice','Rick'], 
               'income': [40000, 50000, 42000, 10000],
              'country':['Brazil','USA','Brazil','Canada']})

top = df[df.groupby('country')['income'].transform('size') > 1].groupby(['country'])['income'].mean().nlargest(2)

df_top = df.loc[df['country'].isin(top.reset_index()['country'])]
Brendan
  • 3,901
  • 15
  • 23