In my current project I have a calculation function that runs on one element of a vector A and returns a list element that I insert into list B. The return element contains a number of large arbitrarily sized matrices that relate to the first list.
As an example let's take a function that takes an original number n and generates a random matrix of n x n.
vector.A <- sample(1:2000, 15000, replace = TRUE)
list.B <- as.list(rep(NA, length(vector.A)))
arbitraryMatrix <- function(n) {
matrix(rnorm(n*n), ncol = n, nrow = n)
}
for ( i in which(is.na(list.B)) ) {
print(i)
list.B[[i]] <- arbitraryMatrix( vector.A[i] )
}
This function slows down the larger list.B gets (in fact I'm pretty sure it will crash R before it finishes the loop). It occurred to me that no element of list.B is ever accessed again after it's created so it could be written to disk rather than taking up memory in a way that slows down the calculations.
I could write a script that would do this by saving chunks into .rda files but I was hoping someone had a more elegant solution.
The FF package looked like an interesting possibility for this http://cran.r-project.org/web/packages/ff/ff.pdf but as far as I can tell it doesn't support list objects.
Caveats:
- I'm using a for loop because I like to be able to repair bugs that arise on the 7000th iteration without having to rerun the first 6999 iterations unnecessarily.
- Depending on your machine edit the parameters of the code till it can run but only slowly on your
computer. - The actual problem I have takes a list as its input so I'm not interested in vectorising the arbitraryMatrix function.
- The memory problem is compounded in my actual problem as the function uses a lot of memory (it involves subsetting data frames).
EDIT: I'm considering the mmap package that maps r objects to temporary files but I'm still trying to work out how to use it for this problem.