I have a very large training set (~2Gb) in a CSV file. The file is too large to read directly into memory (read.csv()
brings the computer to a halt) and I would like to reduce the size of the data file using PCA. The problem is that (as far as I can tell) I need to read the file into memory in order to run a PCA algorithm (e.g., princomp()
).
I have tried the bigmemory
package to read the file in as a big.matrix
, but princomp
doesn't function on big.matrix
objects and it doesn't seem like big.matrix
can be converted into something like a data.frame
.
Is there a way of running princomp
on a large data file that I'm missing?
I'm a relative novice at R, so some of this may be obvious to more seasoned users (apologies in avance).
Thanks for any info.