How to get the correlation between two selected columns in a DataFrame using .corr() Pearson Correlation

Question

I am working with a big DataFrame. But I am trying to get correlation between two columns. I used this code:
corr_P=Top15['Energy Supply per Capita'].corr(Top15['Energy Supply per Capita'])

It gives me an error saying:
'sqrt' method is not available for 'float' type.

It's an assigment I have to use ".corr() method, (Pearson's correlation)".

Your questions is not well formatted, try to add example dataframe. See more information here: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — Erfan, Mar 13 '19 at 16:35

score 1 · Answer 1 · answered Mar 13 '19 at 16:48

You can specify columns the way you have. I tried out your code on the same dataset and had no errors. I would be curious as to what versions you are using.

Also, I am assuming you wanted to see the correlation of 2 columns (the same columns). If I run that, it gives the correct output of 1

import pandas as pd
import numpy as np
import re
def split_it(line):
    line = re.split('(\d+)', line)
    return line[0]
def get_energy():
    energy = pd.read_excel('C:/Energy Indicators.xls', skiprows = 17, skip_footer = 38, parse_cols = range(2, 6), index_col = None, names = ["Country", "Energy Supply", "Energy Supply per Capita", "% Renewable"], na_values='...')
    energy['Energy Supply'] = energy['Energy Supply'] * 1000000
    energy['Country'] = energy["Country"].apply(split_it)
    energy = energy.replace ("Republic of Korea", "South Korea")
    energy = energy.replace("United States of America", "United States")
    energy = energy.replace('United Kingdom of Great Britain and Northern Ireland' , 'United Kingdom')
    energy = energy.replace('China, Hong Kong Special Administrative Region', 'Hong Kong')
    energy['Country'] = energy['Country'].apply(lambda x: re.sub(r'\(.*\)', '', x))
#     energy.Country = energy.Country.apply(lambda x: x.split(' (')[0])
    energy['Country'] = energy['Country'].map(lambda x: x.strip())
    return energy
Top15 = get_energy()


corr_P = Top15['Energy Supply per Capita'].corr(Top15['Energy Supply per Capita'])

Output:

print (corr_P)
1.0

How to get the correlation between two selected columns in a DataFrame using .corr() Pearson Correlation

1 Answers1