From the web site pointed to in a previous question, it appears that you want to represent
> print(object.size(integer(10000 * 72000)), units="Mb")
2746.6 Mb
which should be 'easy' with 8 GB you reference in another question. Also, the total length is less than the maximum vector length in R, so that should be ok too. But see the end of the response for an important caveat!
I created, outside R, a tab-delimited version of the data file. I then read in the information I was interested in
what <- list(User=integer(), Film=integer(), Rating=numeric(), NULL)
x <- scan(fl, what)
the 'NULL' drops the unused timestamp data. The 'User' and 'Film' entries are not sequential, and numeric()
on my platform take up twice as much memory as integer()
, so I converted User and Film to factor, and Rating to integer() by doubling (original scores are 1 to 5 in increments of 1/2).
x <- list(User=factor(x$User), Film=factor(x$Film),
Rating=as.integer(2 * x$Rating))
I then allocated the matrix
ratings <- matrix(NA_integer_ ,
nrow=length(levels(x$User)),
ncol=length(levels(x$Film)),
dimnames=list(levels(x$User), levels(x$Film)))
and use the fact that a two-column matrix can be used to index another matrix
ratings[cbind(x$User, x$Film)] <- x$Rating
This is the step where memory use is maximum. I'd then remove unneeded variable
rm(x)
The gc()
function tells me how much memory I've used...
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 140609 7.6 407500 21.8 350000 18.7
Vcells 373177663 2847.2 450519582 3437.2 408329775 3115.4
... a little over 3 Gb, so that's good.
Having done that, you'll now run in to serious problems. kmeans (from your response to questions on an earlier earlier answer) will not work with missing values
> m = matrix(rnorm(100), 5)
> m[1,1]=NA
> kmeans(m, 2)
Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
and as a very rough rule of thumb I'd expect ready-made R solutions to requires 3-5 times as much memory as the starting data size. Have you worked through your analysis with a smaller data set?