3

I have a dataframe 'df' with index 'Country' and a column 'Estimated Population'. enter image description here

The index has 15 country names. I also have a dictionary:

ContinentDict  = {'China':'Asia', 
              'United States':'North America', 
              'Japan':'Asia', 
              'United Kingdom':'Europe', 
              'Russian Federation':'Europe', 
              'Canada':'North America', 
              'Germany':'Europe', 
              'India':'Asia',
              'France':'Europe', 
              'South Korea':'Asia', 
              'Italy':'Europe', 
              'Spain':'Europe', 
              'Iran':'Asia',
              'Australia':'Australia', 
              'Brazil':'South America'}

All the countries in the dictionary are present in the dataframe. Using the given dictionary, I need to "group the Countries by Continent, then create a dateframe that displays the mean and std deviation for the estimated population of each country."

This is the code I tried:

df2=df.groupby(ContinentDict)['Estimated Population'].agg({'mean':np.mean,'std':np.std})

When I run this code I get the error "No numeric types to aggregate"

Then I tried the following code:

df2=pd.to_numeric(df.groupby(ContinentDict)['Estimated Population']).agg({'mean':np.mean,'std':np.std})

This gives me the error "Buffer has wrong number of dimensions (expected 1, got 2)"

How can I eliminate these errors and get the database I need?

Harsha
  • 533
  • 3
  • 13

1 Answers1

3

You need to change the dtype of the Estimated Population column before applying the .agg function.

Use:

df['Estimated Population'] = df['Estimated Population'].astype(float)

Or,

df['Estimated Population'] = pd.to_numeric(df['Estimated Population'])
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53