I have two .csv files containing correlation matrices exported from R. One file contains the P-values and one contains the r-values. The row and column headers match exactly between the two files.
I am trying to extract the r-values and corresponding row and column header for pairs only when the P-value < 0.05. Here is a sample of what the data in the r-value input file looks like (I have 1700+ correlated items, rather than only the two shown):
Species1 Species2
Species1 1 0.9
Species2 0.9 1
The P-value input file is identical, except containing P-values in place of r-values.
I am relatively new to Python, and am not sure how to handle files of this type. I have tried a few strategies, including using the csv library to iterate through the files. I looked into using numpy, but it doesn't seem that it will work for me (?). I also looked into using scipy to calculate r- and P-values (Pearsons) in Python, but it seems that this only works for comparing two one dimensional arrays (I have 1700+ columns of data to correlate).
Code I am starting with, to show you what I have imported:
import csv
infileP = open('AllcorrP.csv', 'rU')
infileR = open('AllcorrR.csv', 'rU')
The question Can anyone help me extract the column and row headers and r-values from my r-value file based on significant (< 0.05) P-values from my p-value file?
OR
Calculate the r- and P-values for all possible correlations between many columns of data directly using Python and extract only the results with significant P-values?
In the end, I would like output in two files.
First file:
Species1 Species2 Species4 ...
Species2 Species1 Species7 ...
etc...(where "Species1" is the first species with significant correlations and the next items on the line are the species that it significantly correlated with (Species2, Species4 etc.)
Second file:
Species1 (corr) Species2 = 0.87
Species2 (corr) Species7 = 0.72
...
etc. which shows each pairwise correlation and the r-value that goes with it
At this point, I'd be happy to just be able to extract a list of the r-values and species that I want and figure out the final two file formatting later. Thank you!