0

I'm fairly new at python, and would like to perform correlation between 2 dataframes.

df1 = pd.DataFrame({'Date':['2015-01-04','2015-01-05','2015-01-06'],
                   'stockprice1':['1.01','1.01','1.01',],
                   'stockprice2':['1.04','1.05','1.03',]})

df2 = pd.DataFrame({'Date':['2015-01-04','2015-01-05','2015-01-06'],
                   'variable1':['1.11','1.21','1.31',],
                   'variable2':['2.01','2.04','2.03',]})

result = df1.corrwith(df2)

My intended output would look something like a 2x2 display of the correlation coefficients (of stockprice vs. variable). However, the code below doesn't seem to work, does anyone know what i'm doing wrong?

  • Hi There please provide a text based data sample with your ideal output [mcve]. Have a read of [this](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) question and answer to find out how to ask a good pandas question. – Umar.H May 03 '20 at 12:27
  • @Datanovice Hey sorry for that, i've revised my question to try to isolate the problem – Terence Lee May 03 '20 at 13:18

1 Answers1

1

Few corrections,

  1. stockprice1 has 0 variance so the correlation between itself and other variables is going to be NaN.

  2. corrwith is used when the dataframes share the same column names which are not the case.

  3. Correlations are for numeric type variables, here the variables are strings.

Solution: astype as float all variables, then concat both dataframes, use corr, and filter the matrix.

#change df1 so the correlations are no `NaN`
df1.stockprice1 = ['1.02', '1.09', '1.01']

df1[['stockprice1', 'stockprice2']] = df1[[
    'stockprice1', 'stockprice2']].astype(float)
df2[['variable1', 'variable2']] = df2[[
    'variable1', 'variable2']].astype(float)


correlations = pd.concat([df1, df2], axis=1).corr().iloc[0:2, -2:]


#               variable1   variable2
# stockprice1   -0.114708   0.675845
# stockprice2   -0.500000   0.327327
jcaliz
  • 3,891
  • 2
  • 9
  • 13