1

I've got 2 dataframes each with 150 rows and 10 columns + column and row IDs. I want to correlate every row in one dataframe with every row in the other (e.g. 150x150 correlations) and plot the distribution of the resulting 22500 values.(Then I want to calculate p values etc from the distribution - but that's the next step).

Frankly I don't know where to start with this. I can read my data in and see how to correlate vectors or matching slices of two matrices etc., but I can't get handle on what I'm trying to do here.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
  • Please help us help you by providing us with a reproducible example (i.e. code and example data), see http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for details. – Paul Hiemstra May 16 '13 at 08:00

2 Answers2

2
set.seed(42)
DF1 <- as.data.frame(matrix(rnorm(1500),150))
DF2 <- as.data.frame(matrix(runif(1500),150))

#transform to matrices for better performance
m1 <- as.matrix(DF1)
m2 <- as.matrix(DF2)

#use outer to get all combinations of row numbers and apply a function to them
#22500 combinations is small enough to fit into RAM
cors <- outer(seq_len(nrow(DF1)),seq_len(nrow(DF2)),
     #you need a vectorized function
     #Vectorize takes care of that, but is just a hidden loop (slow for huge row numbers)
     FUN=Vectorize(function(i,j) cor(m1[i,],m2[j,])))
hist(cors)

enter image description here

Roland
  • 127,288
  • 10
  • 191
  • 288
  • 1
    +1, could you however also add a bit of description to the code. That makes it even more useful for others (including the OP). – Paul Hiemstra May 16 '13 at 08:09
  • Hi Roland, Thank you so much - it did the trick AND I've learnt a whole lot. Cheers. (Oh and thank you too Paul). – user2388815 May 18 '13 at 08:09
1

You can use cor with two arguments:

cor( t(m1), t(m2) )
Vincent Zoonekynd
  • 31,893
  • 5
  • 69
  • 78