trouble in loop in R: matrix or data frame in the beginning?

Question

I have a data frame (new_t) with rows=strains (28 of them) and columns are genes (12559 of them) and the cells are the expression values for these genes. i want to see the correlation of each gene with the last one. so i want to compare each column like a vector to the last column vector..

> rr<-matrix()
> for (i in 1:ncol(new_t)) {
  bb<-cor(x=new_t[,i], method='spearman', y=new_t[,12559])
  rr<-cbind(rr, bb)
  }

my problem is that when the loop finishes, rr that is formed is all composed of bb's.. as in bb bb bb bb...

if i change rr into a data frame, it gives me error

Error in data.frame(..., check.names = FALSE) : 
 arguments imply differing number of rows: 0, 1

any help is appreciated

Jilber Urbina · Accepted Answer · 2013-09-03T16:30:56.703

You can use apply to avoid the for loop and get the same resutls.

A toy example

> set.seed(1)
> new_t <- matrix(rnorm(100, 100, 3), 10)
> apply(new_t, 2, cor, method="spearman", y=new_t[,10])
 [1] -0.30909091 -0.17575758  0.41818182 -0.36969697 -0.33333333  0.10303030 -0.18787879 -0.36969697
 [9]  0.01818182  1.00000000

I think that with your data it should be:

apply(new_t, 2, cor, method="spearman", y=new_t[,12559])

Or even simplier using just cor without apply and selecting the last column from the correlation matrix.

> cor(new_t, method="spearman")[, ncol(new_t)]
 [1] -0.30909091 -0.17575758  0.41818182 -0.36969697 -0.33333333  0.10303030 -0.18787879 -0.36969697
 [9]  0.01818182  1.00000000

thank you i used the apply function and solved the problem. thanks for the fast response. — user2091290, Sep 04 '13 at 12:39

score 0 · Answer 2 · edited May 23 '17 at 12:31

From the help page of cbind (?cbind):

If there are several matrix arguments, they must all have the same number of columns (or rows) and this will be the number of columns (or rows) of the result. If all the arguments are vectors, the number of columns (rows) in the result is equal to the length of the longest vector. Values in shorter arguments are recycled to achieve this length (with a warning if they are recycled only fractionally).

When the arguments consist of a mix of matrices and vectors the number of columns (rows) of the result is determined by the number of columns (rows) of the matrix arguments. Any vectors have their values recycled or subsetted to achieve this length.

...

The cbind data frame method is just a wrapper for data.frame(..., check.names = FALSE). This means that it will split matrix columns in data frame arguments, and convert character columns to factors unless stringsAsFactors = FALSE is specified.

I suspect that you are mixing up the number of rows. I'm not sure why you are getting your error with matrix(), as you did not provide a reproducible example. Applying cbind to data.frame() throws an error because the number of rows do not match.

## this seems to work
cbind(matrix(),cor(1:10,2:11))
#      [,1] [,2]
# [1,]   NA    1

## this throws an error
cbind(data.frame(),1)
# Error in data.frame(..., check.names = FALSE) : 
#   arguments imply differing number of rows: 0, 1

You'd be better off avoiding the for-loop altogether and using apply or sapply:

sapply(seq_len(ncol(new_t)), function(i) 
  cor(x=new_t[,i], method='spearman', y=new_t[,12559]))

trouble in loop in R: matrix or data frame in the beginning?

2 Answers2