I run into the problem of calculating the crosscorrelation. For this assignment we are supposed to use the Pandas .corr method.
I searched around but could not find a suitable solution.
Below is the code.
Top15 gives a Pandas df. The
Top15 = answer_one()
%for testing purposes: - works fine :-(
df = pd.DataFrame({'A': range(4), 'B': [2*i for i in range(4)]})
print(df['A'].corr(df['B']))
Top15['Population']=Top15['Energy Supply']/Top15['Energy Supply per capita']
Top15['Citable docs per Capita']=Top15['Citable documents']/Top15['Population']
% check my data
print(Top15['Energy Supply per capita'])
print(Top15['Citable docs per Capita'])
correlation=Top15['Citable docs per Capita'].corr(Top15['Energy Supply per capita'])
print(correlation)
return correlation
After all this should work. But no, it does not :-(
This the out put I get: (the 1.0 is from test with df.['A] etc.)
1.0
Country
China 93
United States 286
Japan 149
United Kingdom 124
Russian Federation 214
Canada 296
Germany 165
India 26
France 166
South Korea 221
Italy 109
Spain 106
Iran 119
Australia 231
Brazil 59
Name: Energy Supply per capita, dtype: object
Country
China 9.269e-05
United States 0.000298307
Japan 0.000237714
United Kingdom 0.000318721
Russian Federation 0.000127533
Canada 0.000500002
Germany 0.00020942
India 1.16242e-05
France 0.00020322
South Korea 0.000239392
Italy 0.000180175
Spain 0.00020089
Iran 0.00011442
Australia 0.000374206
Brazil 4.17453e-05
Name: Citable docs per Capita, dtype: object
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-124-942c0cf8a688> in <module>()
22 return correlation
23
---> 24 answer_nine()
<ipython-input-124-942c0cf8a688> in answer_nine()
15 Top15['Citable docs per Capita']=np.float64(Top15['Citable docs per Capita'])
16
---> 17 correlation=Top15['Citable docs per Capita'].corr(Top15['Energy Supply per capita'])
18
19
/opt/conda/lib/python3.5/site-packages/pandas/core/series.py in corr(self, other, method, min_periods)
1392 return np.nan
1393 return nanops.nancorr(this.values, other.values, method=method,
-> 1394 min_periods=min_periods)
1395
1396 def cov(self, other, min_periods=None):
/opt/conda/lib/python3.5/site-packages/pandas/core/nanops.py in _f(*args, **kwargs)
42 f.__name__.replace('nan', '')))
43 try:
---> 44 return f(*args, **kwargs)
45 except ValueError as e:
46 # we want to transform an object array
/opt/conda/lib/python3.5/site-packages/pandas/core/nanops.py in nancorr(a, b, method, min_periods)
676
677 f = get_corr_func(method)
--> 678 return f(a, b)
679
680
/opt/conda/lib/python3.5/site-packages/pandas/core/nanops.py in _pearson(a, b)
684
685 def _pearson(a, b):
--> 686 return np.corrcoef(a, b)[0, 1]
687
688 def _kendall(a, b):
/opt/conda/lib/python3.5/site-packages/numpy/lib/function_base.py in corrcoef(x, y, rowvar, bias, ddof)
2149 # nan if incorrect value (nan, inf, 0), 1 otherwise
2150 return c / c
-> 2151 return c / sqrt(multiply.outer(d, d))
2152
2153
AttributeError: 'float' object has no attribute 'sqrt'
I am sorry. But by now I have no clue want goes wrong and why it doesn't work.
Could anyone point me to the solution?
Thanks.
edit: the basic dataframe looks like this (first two line + header):
Rank Documents Citable documents Citations Self-citations Citations per document H index 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Energy Supply Energy Supply per capita % Renewable
Country
China 1 127050 126767 597237 411683 4.70 138 3.992331e+12 4.559041e+12 4.997775e+12 5.459247e+12 6.039659e+12 6.612490e+12 7.124978e+12 7.672448e+12 8.230121e+12 8.797999e+12 1.271910e+11 93 19.754910
United States 2 96661 94747 792274 265436 8.20 230 1.479230e+13 1.505540e+13 1.501149e+13 1.459484e+13 1.496437e+13 1.520402e+13 1.554216e+13 1.577367e+13 1.615662e+13 1.654857e+13 9.083800e+10 286 11.570980
Japan 3 30504 30287 223024 61554 7.31 134 5.496542e+12 5.617036e+12 5.558527e+12 5.251308e+12 5.498718e+12 5.473738e+12 5.569102e+12 5.644659e+12 5.642884e+12 5.669563e+12 1.898400e+10 149 10.232820