Compute correlation of two DataFrames columnwise

Question

I have two DataFrames and I want to compute their correlations without looping:

import pandas as pd
df1 = pd.DataFrame({'A': range(0,4), 'B': range(14,10,-1)})
df2 = pd.DataFrame({'C': range(104,100,-1), 'D': range(2,6), 'E': range(11,7,-1)})
corr = pd.DataFrame(dict(c1=c1, **{c2:df2[c2].corr(df1[c1]) for c2 in df2.columns})
                    for c1 in df1.columns).set_index("c1")
corr.index.name = None

Now corr is

     C    D    E
A -1.0  1.0 -1.0
B  1.0 -1.0  1.0

Neither DataFrame.corr nor DataFrame.corrwith do what I need.

https://stackoverflow.com/questions/30143417/computing-the-correlation-coefficient-between-two-multi-dimensional-arrays — BENY, Dec 18 '19 at 21:36
Wow, that link was news to me. I wonder why `pandas` favors the double loop in DataFrame.corr()? It it because it's a bit more free to deal with different methods, or is is just a memory concern once you're in the world of 50 columns and 40M+ rows? — ALollz, Dec 18 '19 at 21:51

score 3 · Accepted Answer · edited Dec 19 '19 at 14:53

3

You can use the methods apply and corrwith:

df2.apply(df1.corrwith)

Output:

     C    D    E
A -1.0  1.0 -1.0
B  1.0 -1.0  1.0

edited Dec 19 '19 at 14:53

sds

58,617
29
161
278

answered Dec 18 '19 at 22:35

Mykola Zotko

15,583
3
71
73

score 1 · Answer 2 · answered Dec 18 '19 at 21:40

1

Concatem:

pd.concat([df1, df2], axis=1, keys=['df1', 'df2']).corr().loc['df1', 'df2']

     C    D    E
A -1.0  1.0 -1.0
B  1.0 -1.0  1.0

answered Dec 18 '19 at 21:40

d_kennetz

5,219
5
21
44

1

Prettier than the other answer, thanks, but still imperfect in that it computes (n+m)^2 correlations instead of n*m correlations. – sds Dec 18 '19 at 21:43
1

(the other answer I refer to has now been deleted) – sds Dec 19 '19 at 14:52

Compute correlation of two DataFrames columnwise

2 Answers2