2

I have a very large data in xls . A small portion of it is shown below

 Name    V1  V2  V3  V4  V5
    A   2   2   2   1   2
    Ab  10  10  9   1   10
    AC  14  7   1   2   14
    AD  5   1   1   1   5
    AF  14  14  11  1   14
    Ag  3   3   3   1   3
    Qn  7   7   7   3   7
    Ah  35  3   3   1   35

I want to calculated all possible combination of rows correlation coefficient for example, Row 1 and Row 2, Row 1 and Row3 ...

The output I like to have like this

Name1   Name2     Correlation Coef
A       Ab    
A       AC
.          .
.          .
.          .

i found this one but I could not figure out how to use it Calculating Pearson correlation and significance in Python

This solution is giving an answer but the problem is that I cannot make the output I want http://lilithelina.tumblr.com/post/135265946959/data-analysis-pearson-correlation-python

Community
  • 1
  • 1

1 Answers1

2

Read your data as a pandas dataframe (let's say, df). Call df.T.corr().unstack().reset_index(). .unstack() builds a hierarchical index, .reset_index() converts it into columns, and .T correlates rows instead of columns.

results = df.T.corr().unstack().reset_index(name="corr")
print(results)
#   level_0 level_1      corr
#0   A1L020  A1L020  1.000000
#1   A1L020  A1X283  0.993933
#2   A1L020  A2A3N6  0.499363
#3   A1L020  A2RTX5  0.408248
#....
results.to_csv("some_file.csv")
DYZ
  • 55,249
  • 10
  • 64
  • 93