0

I have 2 dataframes:

df = pd.DataFrame({'SAMs': ['GOS', 'BUM', 'BEN', 'AUD', 'VWA','HON'], 
    'GN1': [22, 22, 2, 2, 2,5], 
    'GN2':[1.1,5.7,4.8,7.09,10.876,0.178]})
df



    GN1 GN2 SAMs
0   22  1.100   GOS
1   22  5.700   BUM
2   2   4.800   BEN
3   2   7.090   AUD
4   2   10.876  VWA
5   5   0.178   HON

and df2:

df2 = pd.DataFrame({'SAMs': ['FAMS', 'SAP', 'KLM', 'SOS', 'LUD','EJT'], 
    'GN1': [22, 22, 2, 2, 2,5], 
    'GN2':[1.1,5.7,4.8,7.09,10.876,0.178]})

I need to calculate the pearson correlations between the column SAMs from df1 and df2. For each value in column SAMs from both df1 and df2, I'd like to make pairwise combinations and calculate their correlations.

At the end, the output should look like:

SAMs    correlation_value P-value
GOS-FAMS   0.45             0.87
GOS-SAP    0.55             1
GOS-KLM     0.15            0.89
...
HON-EJT     0.156            0.98

Any suggestions would be great!

vestland
  • 55,229
  • 37
  • 187
  • 305
ARJ
  • 2,021
  • 4
  • 27
  • 52
  • I don't get it - your sample code and sample `df` do not match. Can you fix your question so it's consistent? – John Zwinck May 22 '17 at 12:09
  • Sorry, it was a mistake and now I corrected that. – ARJ May 22 '17 at 12:26
  • Does this help? https://stackoverflow.com/questions/3949226/calculating-pearson-correlation-and-significance-in-python – John Zwinck May 22 '17 at 12:42
  • thanks for the link, but no it won't, there the input is two lists and then computing their correlation. In my question, I have two data frames to compute the same – ARJ May 22 '17 at 13:13

0 Answers0