Calculate the pearson correlation between two lists

Question

I have many equally structured text files containing experimental data (641*976). At the beginning I define the correct "working directory" and order all the files in a list. Thereby I generate two different lists. Once the file.listx containing my sample data and once the file.listy containing reference data. Afterwards I rearrange the data in order to conduct the correlation analysis. Here the code shows how I generate the "x" list. The "y" list was generated exactly the same way with the reference data.

file.listx <- list.files(pattern="*.txt", full.names=T)

datalist = lapply(file.listx, FUN=read.table, header = F, sep = "\t", skip = 2)
cmbn = expand.grid(1:641, 1:977)
flen = length(datalist)
x=lapply(1:(nrow(cmbn)),function(t,lst,cmbn){
  return(sapply(1:flen,function(i,t1,lst1,cmbn1){
    return(lst1[[i]][cmbn1$Var1[t1],cmbn1$Var2[t1]])},t,lst,cmbn))}
  ,datalist,cmbn)

Now I want to calculate the pearson correlation between the two lists. http://www.datasciencemadesimple.com/pearson-function-in-excel/ According to the pearson correlation formula corresponds my "x" to the sample and my "y" to the reference.

cor(x, y, method = "pearson")

Then the error message pops up that 'x' must be numeric. I do not know how I can solve this problem. When I use,

x = as.numeric(x)

it seems that the list structure gets lost. And the following approach does also not solve the problem.

x = as.matrix(x)

How can I convert my list into a numeric type without loosing the structure? I want to calculate the pearson correlation between the two lists.

Here is the code to generate two dummy lists. This way the error can be reproduced.

x = list(4:10, 10:16, 32:38, 100:106) # sample
y = list(10:16, 20:26, 40:46, 110:116) # reference
cor(x, y, method = "pearson")

Please provide a sample of your data. In any case, did you try removing non numeric columns? — NelsonGon, Aug 01 '19 at 13:59
My list does only contain numeric values. For example x[5] returns a vector with numeric values. — stefx, Aug 01 '19 at 14:03
Alright, please provide a sample of your data. See [this post](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for clues. — NelsonGon, Aug 01 '19 at 14:07
Is this what you need? `Map(cor,x,y)` or `lapply(seq_along(x), function(ind) cor(x[[ind]],y[[ind]]))`? — NelsonGon, Aug 01 '19 at 14:32
No not really. I want to apply the pearson function exactly the same way as described in here: http://www.datasciencemadesimple.com/pearson-function-in-excel/ So in the numerator we substract the mean of a list from each value within the list. And this for the sample (x) and reference (y) data. And then in the denominator we apply exactly the same mathematical operations and additionally square root the result. — stefx, Aug 01 '19 at 14:34
Can't quite get. Do you want to define your own correlation calculation or use the inbuilt `cor` function? — NelsonGon, Aug 01 '19 at 14:39
Your last suggestion works! And all the values are 1. Does this make sense for the dummy data set? — stefx, Aug 01 '19 at 14:48
Well, kinda puzzled me. On phone now. Would need to look at it more. — NelsonGon, Aug 01 '19 at 15:33

Calculate the pearson correlation between two lists

0 Answers0