3

This might seem like a similar question which was asked in this URL (Apply PCA on very large sparse matrix).

But I am still not able to get my answer for which i need some help. I am trying to perform a PCA for a very large dataset of about 700 samples (columns) and > 4,00,000 locus (rows). I wish to plot "samples" in the biplot and hence want to consider all of the 4,00,000 locus to calculate the principal components.

I did try using princomp(), but I get the following error which says,

Error in princomp.default(transposed.data, cor = TRUE) : 
'`princomp`' can only be used with more units than variables

I checked with the forums and i saw that in the cases where there are less units than variables, it is better to use prcomp() than princomp(), so i tried that as well, but i again get the following error,

Error in cor(transposed.data) : allocMatrix: too many elements specified

So I want to know if any of you could suggest me any other good option which could be best suited for my very large data. I am a beginner for statistics, but I did read about how PCA works. I want to know if there are any other easy-to-use R packages or tools to perform this?

Community
  • 1
  • 1
Letin
  • 1,255
  • 5
  • 20
  • 36
  • 1
    Maybe it's just me, but I find your vocabulary ambiguous when describing your data. The convention is that columns correspond to *variables* and rows correspond to *observations*. Don't you have 700 variables (or dimensions) and 4,000,000 observations (or points)? If so, then you should not transpose your data before passing it to `princomp`. – flodel Sep 28 '12 at 00:08
  • 1
    @flodel I am not an expert, but I believe in genetics it is very common for that to be reversed. – joran Sep 28 '12 at 00:12
  • You might find this interesting: http://en.wikipedia.org/wiki/Principal_component_analysis#Iterative_computation. – flodel Sep 28 '12 at 00:31
  • 3
    Read about the FactoMineR package in bioconductor site.I believe it is capable of handling large data sets. – user1021713 Sep 28 '12 at 04:55

0 Answers0