8

I have recently discovered the wonders of the packages bigmemory, ff and filehash to handle very large matrices.

How can I handle very large (300MB++) lists? In my work I work with these lists all day every day. I can do band-aid solution with save() & load() hacks everywhere but I would prefer a bigmemory-like solution. Something like a bigmemory bigmatrix would be ideal, where I work with it basically identically to a matrix except it takes up somethign like 660 bytes in my RAM.


These lists are mostly >1000 length lists of lm() objects (or similar regression objects). For example,

Y <- rnorm(1000) ; X <- rnorm(1000)
A <- lapply(1:6000, function(i) lm(Y~X))
B <- lapply(1:6000, function(i) lm(Y~X))
C <- lapply(1:6000, function(i) lm(Y~X))
D <- lapply(1:6000, function(i) lm(Y~X))
E <- lapply(1:6000, function(i) lm(Y~X))
F <- lapply(1:6000, function(i) lm(Y~X))

In my project I will have A,B,C,D,E,F-type lists (and even more than this) that I have to work with interactively.

If these were gigantic matrices there is a tonne of support. I was wondering if there was any similar support in any package for large list objects.

Ben
  • 41,615
  • 18
  • 132
  • 227
Jase
  • 1,025
  • 1
  • 9
  • 34
  • Are you asking for suggestions about which package to use? – GSee Sep 25 '12 at 14:56
  • 2
    Appears is too vague to support "best practices" advice. Describing the nature of these "lists" would be needed. Perhaps one of the database interfaces. Coding advice requires better task description. – IRTFM Sep 25 '12 at 16:35
  • 1
    Look at the [SOAR](http://cran.r-project.org/web/packages/SOAR/index.html) package perhaps? – mnel Sep 26 '12 at 00:48
  • 6
    Do you need all of the `"lm"` object content? If only the coefficients were needed, say, then it would be possible to represent each list of '"lm"` objects as a matrix. Also you might be able to use the faster `lm.fit`: `sapply(1:6000, function(i) coef(lm.fit(cbind(1, X), Y)))` – G. Grothendieck Sep 26 '12 at 02:17
  • @G.Grothendieck That's good advice that I'll take on board. However I would still like a resolution to the question for the cases of regression objects like `rq()` and other such objects where this type of solution isn't available. – Jase Sep 26 '12 at 02:18
  • 1
    Try: `sapply(1:3, function(i) coef(rq.fit(cbind(1, X), Y)))` – G. Grothendieck Sep 26 '12 at 02:24
  • 4
    Living more dangerously, you also might be able to hack the fitted objects to erase the big pieces (i.e., the original data are typically stored independently in each regression fit). Most accessor methods would still work without the data, and you could restore the data in that slot if you needed to. (Admittedly this is still not a general-purpose answer to your question, to which it looks like the answer is "no".) – Ben Bolker Oct 22 '12 at 21:05
  • This is a very good pseudo-solution! – Jase Oct 23 '12 at 14:07
  • It may be a kludge, but you could hash your objects with the digest package, and add the hashed strings to a Bigmatrix – Róisín Grannell Dec 25 '12 at 07:49

1 Answers1

2

You can store and access lists on disk using the package. This should work (if rather slowly on my machine...):

Y <- rnorm(1000) ; X <- rnorm(1000)

# set up disk object
library(filehash)
dbCreate("myTestDB")
db <- dbInit("myTestDB")

db$A <- lapply(1:6000, function(i) lm(Y~X))
db$B <- lapply(1:6000, function(i) lm(Y~X))
db$C <- lapply(1:6000, function(i) lm(Y~X))
db$D <- lapply(1:6000, function(i) lm(Y~X))
db$E <- lapply(1:6000, function(i) lm(Y~X))
db$F <- lapply(1:6000, function(i) lm(Y~X))

List items can be accessed using the [ function. See here for more details: http://cran.r-project.org/web/packages/filehash/vignettes/filehash.pdf

Ben
  • 41,615
  • 18
  • 132
  • 227