Interactively work with list objects that take up massive memory

Question

I have recently discovered the wonders of the packages bigmemory, ff and filehash to handle very large matrices.

How can I handle very large (300MB++) lists? In my work I work with these lists all day every day. I can do band-aid solution with save() & load() hacks everywhere but I would prefer a bigmemory-like solution. Something like a bigmemory bigmatrix would be ideal, where I work with it basically identically to a matrix except it takes up somethign like 660 bytes in my RAM.

These lists are mostly >1000 length lists of lm() objects (or similar regression objects). For example,

Y <- rnorm(1000) ; X <- rnorm(1000)
A <- lapply(1:6000, function(i) lm(Y~X))
B <- lapply(1:6000, function(i) lm(Y~X))
C <- lapply(1:6000, function(i) lm(Y~X))
D <- lapply(1:6000, function(i) lm(Y~X))
E <- lapply(1:6000, function(i) lm(Y~X))
F <- lapply(1:6000, function(i) lm(Y~X))

In my project I will have A,B,C,D,E,F-type lists (and even more than this) that I have to work with interactively.

If these were gigantic matrices there is a tonne of support. I was wondering if there was any similar support in any package for large list objects.

Appears is too vague to support "best practices" advice. Describing the nature of these "lists" would be needed. Perhaps one of the database interfaces. Coding advice requires better task description. — IRTFM, Sep 25 '12 at 16:35
Look at the [SOAR](http://cran.r-project.org/web/packages/SOAR/index.html) package perhaps? — mnel, Sep 26 '12 at 00:48
Do you need all of the `"lm"` object content? If only the coefficients were needed, say, then it would be possible to represent each list of '"lm"` objects as a matrix. Also you might be able to use the faster `lm.fit`: `sapply(1:6000, function(i) coef(lm.fit(cbind(1, X), Y)))` — G. Grothendieck, Sep 26 '12 at 02:17
@G.Grothendieck That's good advice that I'll take on board. However I would still like a resolution to the question for the cases of regression objects like `rq()` and other such objects where this type of solution isn't available. — Jase, Sep 26 '12 at 02:18
Try: `sapply(1:3, function(i) coef(rq.fit(cbind(1, X), Y)))` — G. Grothendieck, Sep 26 '12 at 02:24
Living more dangerously, you also might be able to hack the fitted objects to erase the big pieces (i.e., the original data are typically stored independently in each regression fit). Most accessor methods would still work without the data, and you could restore the data in that slot if you needed to. (Admittedly this is still not a general-purpose answer to your question, to which it looks like the answer is "no".) — Ben Bolker, Oct 22 '12 at 21:05
It may be a kludge, but you could hash your objects with the digest package, and add the hashed strings to a Bigmatrix — Róisín Grannell, Dec 25 '12 at 07:49

Ben · Answer 1 · 2013-05-30T02:34:16.970

You can store and access lists on disk using the filehash package. This should work (if rather slowly on my machine...):

Y <- rnorm(1000) ; X <- rnorm(1000)

# set up disk object
library(filehash)
dbCreate("myTestDB")
db <- dbInit("myTestDB")

db$A <- lapply(1:6000, function(i) lm(Y~X))
db$B <- lapply(1:6000, function(i) lm(Y~X))
db$C <- lapply(1:6000, function(i) lm(Y~X))
db$D <- lapply(1:6000, function(i) lm(Y~X))
db$E <- lapply(1:6000, function(i) lm(Y~X))
db$F <- lapply(1:6000, function(i) lm(Y~X))

List items can be accessed using the [ function. See here for more details: http://cran.r-project.org/web/packages/filehash/vignettes/filehash.pdf

Interactively work with list objects that take up massive memory

1 Answers1

Linked