I am following the University of Michigan MOOC about Data Science in Python Pandas, and I encounter some problem in a test.
I have to use the groupby function to calculate the sum, mean, size and standard deviation of 15 countries, grouped by continent.
The problem is that sum(), std() and size() work with no problem, but not mean(), and I don't know why.
I already tried to specify the type by using dtype=float
but I does not work.
This is my code :
# --------- This part is ok, just describing so you can understand --------- #
Top15 = answer_one() # load top 15 countries with most scientific publications
# list of the continents for the top 15 countries
ContinentDict = {'China':'Asia',
'United States':'North America',
'Japan':'Asia',
'United Kingdom':'Europe',
'Russian Federation':'Europe',
'Canada':'North America',
'Germany':'Europe',
'India':'Asia',
'France':'Europe',
'South Korea':'Asia',
'Italy':'Europe',
'Spain':'Europe',
'Iran':'Asia',
'Australia':'Australia',
'Brazil':'South America'}
# estimation of the population for each countries
# by calculating the Energy Supply / Energy Supply per Capita
Top15['PopEst'] = Top15['Energy Supply'] / Top15['Energy Supply per Capita']
Top15 = Top15[['PopEst']]
Top15.reset_index(inplace = True)
Top15['Continent'] = None
# loop that add the coresponding continent to the country
for country in Top15['Country']:
index_country = ((Top15.loc[Top15['Country'] == country]) # seek country index
.index)
Top15.iloc[index_country,2] = ContinentDict[country] # add continent to country
# ---------- This is the part where I am having problem ---------- #
# create the 'answer' DataFrame
answer = pd.DataFrame(index=['Asia', 'Australia',
'Europe', 'North America',
'South America'],
columns=['size', 'sum', 'mean', 'std'], dtype=float)
grouped = Top15.groupby('Continent') # group countries by continent
answer['size'] = grouped.size()
answer['sum'] = grouped['PopEst'].sum()
answer['mean'] = grouped['PopEst'].mean()
answer['std'] = grouped['PopEst'].std()
I got at the line answer['mean'] = grouped['PopEst'].mean()
, the error:
DataError: No numeric types to aggregate
I don't know where the problem is.
PopEst contains numeric values. For example, the population estimate of China is 1.36765e+09 people.
This is the DataFrame Top15 returned by answer_one()
I have to work on:
Country PopEst Continent
0 Australia 2.3316e+07 Australia
1 Brazil 2.05915e+08 South America
2 Canada 3.52399e+07 North America
3 China 1.36765e+09 Asia
4 France 6.38373e+07 Europe
5 Germany 8.03697e+07 Europe
6 India 1.27673e+09 Asia
7 Iran 7.70756e+07 Asia
8 Italy 5.99083e+07 Europe
9 Japan 1.27409e+08 Asia
10 Russian Federation 1.435e+08 Europe
11 South Korea 4.98054e+07 Asia
12 Spain 4.64434e+07 Europe
13 United Kingdom 6.3871e+07 Europe
14 United States 3.17615e+08 North America
This is what Top15.to_dict()
returns me:
{'Country': {0: 'Australia',
1: 'Brazil',
2: 'Canada',
3: 'China',
4: 'France',
5: 'Germany',
6: 'India',
7: 'Iran',
8: 'Italy',
9: 'Japan',
10: 'Russian Federation',
11: 'South Korea',
12: 'Spain',
13: 'United Kingdom',
14: 'United States'},
'PopEst': {0: 23316017.316017315,
1: 205915254.23728815,
2: 35239864.86486486,
3: 1367645161.2903225,
4: 63837349.39759036,
5: 80369696.96969697,
6: 1276730769.2307692,
7: 77075630.25210084,
8: 59908256.880733944,
9: 127409395.97315437,
10: 143500000.0,
11: 49805429.864253394,
12: 46443396.2264151,
13: 63870967.741935484,
14: 317615384.61538464},
'Continent': {0: 'Australia',
1: 'South America',
2: 'North America',
3: 'Asia',
4: 'Europe',
5: 'Europe',
6: 'Asia',
7: 'Asia',
8: 'Europe',
9: 'Asia',
10: 'Europe',
11: 'Asia',
12: 'Europe',
13: 'Europe',
14: 'North America'}}