0

I am trying to use cor.test over the rows in 2 matrices, namely cer and par.

cerParCorTest <-mapply(function(x,y)cor.test(x,y),cer,par)  

mapply,however, works on columns.

This issue has been discussed in Efficient apply or mapply for multiple matrix arguments by row . I tried that split solution (as below)

cer <- split(cer, row(cer))
par <- split(par, row(par))

and it results in the error (plus it is slow)

In split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) :
data length is not a multiple of split variable

I also tried t(par) and t(cer) to get it running over the rows, but it results in the error

Error in cor.test.default(x, y) : not enough finite observations

The martices are shown below (for cer and same in par):

                 V1698       V1699       V1700      V1701
YAL002W(cer)  0.01860500  0.01947700  0.02043300  0.0214740
YAL003W(cer)  0.07001600  0.06943900  0.06891200  0.0684330
YAL005C(cer)  0.02298100  0.02391900  0.02485800  0.0257970
YAL007C(cer) -0.00026047 -0.00026009 -0.00026023 -0.0002607
YAL008W(cer)  0.00196200  0.00177360  0.00159490  0.0014258

My question is why transposing the matrix does not work and what is a short solution that will allow running over rows with mapply for cor.test().

I apologise for the long post and thanks in advance for any help.

Community
  • 1
  • 1
noqa
  • 313
  • 2
  • 4
  • 11

2 Answers2

3

I don't know what are the dimensions of your matrix , but this works fine for me

N <- 3751 * 1900
cer.m <- matrix(1:N,ncol=1900)
par.m <- matrix(1:N+rnorm(N),ncol=1900)
ll <- mapply(cor.test,
             split(par.m,row(par.m)),
             split(cer.m,row(cer.m)),
             SIMPLIFY=FALSE)

this will give you a list of 3751 elements(the correlation for each row)

EDIT without split, you give the index of the row , this should be fast

ll <- mapply(function(x,y)cor.test(cer.m[x,],par.m[y,]),
             1:nrow(cer.m),
             1:nrow(cer.m),
             SIMPLIFY=FALSE)

EDIT2 how to get the estimate value:

To get the estimate value for example :

sapply(ll,'[[','estimate')
agstudy
  • 119,832
  • 17
  • 199
  • 261
1

You could always just program things in a for loop, seems reasonably fast on these dimensions:

x1 <- matrix(rnorm(10000000), nrow = 2000)
x2 <- matrix(rnorm(10000000), nrow = 2000)


out <- vector("list", nrow(x1))

system.time(
for (j in seq_along(out)) {
  out[[j]] <- cor.test(x1[j, ], x2[j, ])
}
)
   user  system elapsed 
   1.35    0.00    1.36

EDIT: If you only want the estimate, I wouldn't store the results in a list, but a simple vector:

out2 <- vector("numeric", nrow(x1))

  for (j in seq_along(out)) {
    out2[j] <- cor.test(x1[j, ], x2[j, ])$estimate
  }
head(out2)

If you want to store all the results and simply extract the estimate from each, then this should do the trick:

> out3 <- as.numeric(sapply(out, "[", "estimate"))
#Confirm they are the same
> all.equal(out2, out3)
[1] TRUE

The tradeoff is that the first method stores all the data in a list which may be useful for further processing vs a mroe simple method that only grabs what you initially want.

Chase
  • 67,710
  • 18
  • 144
  • 161
  • I got x must be a numeric vector with cor.test(x1[j, ], x2[j, ]. It must be a simple error I have made. Can you help with? Thanks. – noqa Mar 19 '13 at 21:22
  • @bioinformant - the error is telling you that your data is not numeric. You should double check what your data is with `str()`. If my test code did not run, I'm not entirely sure what would have gone wrong as it works on my machine... – Chase Mar 19 '13 at 21:58
  • Is there a way to only have the cor value. I tried list$estimate and it does not work – noqa Mar 19 '13 at 22:15