I am selecting a subset of a data.frame g.raw
, like this:
g.raw <- read.table(gfile,sep=',', header=F, row.names=1)
snps = intersect(row.names(na.omit(csnp.raw)),row.names(na.omit(esnp.raw)))
g = g.raw[snps,]
It works. However, that last line is EXTREMELY slow.
g.raw
is about 18M rows and snps
is about 1M. I realize these are pretty large numbers, but this seems like a simple operation, and reading in g into a matrix/data.frame held in memory wasn't a problem (took a few minutes), whereas this operation I described above is taking hours.
How do I speed this up? All I want is to shrink g.raw a lot.
Thanks!