6

I have a set of columns (col1,col2,col3) in dataframe df1 I have another set of columns (col4,col5,col6) in dataframe df2 Assume this two dataframes has the same number of rows.

How do I generate a correlation table that do pairwise correlation between df1 and df2?

the table will look like

    col1 col2 col3
col4 ..   ..   ..
col5 ..   ..   ..
col6 ..   ..   ..

I use df1.corrwith(df2), it does not seem to generate the table as required.

I have a asked a similar question here: How to perform Correlation between two dataframes with different column names but now I am dealing with categorical columns.

If it is not comparable directly, is there a standard way to make it comparable (like using get_dummies)? and is that a faster way to automatically process all fields (assume all are categorical) and calculate their correlation?

Community
  • 1
  • 1
  • get_dummies would make the variables binary. in turn you would be able to apply the correlation function to them. – epattaro Jan 24 '17 at 12:03
  • Pearson correlation requires data to be numeric. The correlation you calculate on binary data has no meaning. Look for the association measures on categorical data. – ayhan Jul 10 '17 at 10:01

1 Answers1

5

You are correct that pd.get_dummies would be needed to get the correlation. Below, I will create some fake data with two categorical columns and then use corrwith

df = pd.DataFrame({'col1':np.random.choice(list('abcde'),100),
                  'col2':np.random.choice(list('xyz'),100)}, dtype='category')
df1 = pd.DataFrame({'col1':np.random.choice(list('abcde'),100),
                   'col2':np.random.choice(list('xyz'),100)}, dtype='category')

dfa = pd.get_dummies(df)
dfb = pd.get_dummies(df1)
dfa.corrwith(dfb)

col1_a   -0.057735
col1_b    0.002513
col1_c    0.137956
col1_d   -0.095050
col1_e   -0.114022
col2_x    0.022568
col2_y   -0.081699
col2_z   -0.128350
Ted Petrou
  • 59,042
  • 19
  • 131
  • 136
  • can we also check the significance of the correlation test statistic using this approach? – Ayan Dec 29 '21 at 09:13