9

Under what circumstances does the readRDS() function in R try to load packages/namespaces? I was surprised to see the following in a fresh R session:

> loadedNamespaces()
[1] "base"      "datasets"  "graphics"  "grDevices" "methods"   "stats"    
[7] "tools"     "utils"    
> x <- readRDS('../../../../data/models/my_model.rds')
There were 19 warnings (use warnings() to see them)
> loadedNamespaces()
 [1] "base"         "class"        "colorspace"   "data.table"  
 [5] "datasets"     "dichromat"    "e1071"        "earth"       
 [9] "evaluate"     "fields"       "formatR"      "gbm"         
[13] "ggthemes"     "graphics"     "grDevices"    "grid"        
[17] "Iso"          "knitr"        "labeling"     "lattice"     
[21] "lubridate"    "MASS"         "methods"      "munsell"     
[25] "plotmo"       "plyr"         "proto"        "quantreg"    
[29] "randomForest" "RColorBrewer" "reshape2"     "rJava"       
[33] "scales"       "spam"         "SparseM"      "splines"     
[37] "stats"        "stringr"      "survival"     "tools"       
[41] "utils"        "wra"          "wra.ops"      "xlsx"        
[45] "xlsxjars"     "xts"          "zoo"     

If any of those new packages aren't available, the readRDS() fails.

The 19 warnings mentioned are:

> warnings()
Warning messages:
1: replacing previous import ‘hour’ when loading ‘data.table’
2: replacing previous import ‘last’ when loading ‘data.table’
3: replacing previous import ‘mday’ when loading ‘data.table’
4: replacing previous import ‘month’ when loading ‘data.table’
5: replacing previous import ‘quarter’ when loading ‘data.table’
6: replacing previous import ‘wday’ when loading ‘data.table’
7: replacing previous import ‘week’ when loading ‘data.table’
8: replacing previous import ‘yday’ when loading ‘data.table’
9: replacing previous import ‘year’ when loading ‘data.table’
10: replacing previous import ‘here’ when loading ‘plyr’
11: replacing previous import ‘hour’ when loading ‘data.table’
12: replacing previous import ‘last’ when loading ‘data.table’
13: replacing previous import ‘mday’ when loading ‘data.table’
14: replacing previous import ‘month’ when loading ‘data.table’
15: replacing previous import ‘quarter’ when loading ‘data.table’
16: replacing previous import ‘wday’ when loading ‘data.table’
17: replacing previous import ‘week’ when loading ‘data.table’
18: replacing previous import ‘yday’ when loading ‘data.table’
19: replacing previous import ‘year’ when loading ‘data.table’

So apparently it's loading something like lubridate and then data.table, generating namespace conflicts as it goes.

FWIW, unserialize() gives the same results.

What I really want is to load these objects without also loading everything the person who saved them seemed to have loaded at the time, which is what it sort of looks like it's doing.

Update: here are the classes in the object x:

> classes <- function(x) {
    cl <- c()
    for(i in x) {
      cl <- c(cl, if(is.list(i)) c(class(i), classes(i)) else class(i))
    }
    cl
  }
> unique(classes(x))
 [1] "list"              "numeric"           "rq"               
 [4] "terms"             "formula"           "call"             
 [7] "character"         "smooth.spline"     "integer"          
[10] "smooth.spline.fit"

qr is from the quantreg package, all the rest are from base or stats.

Ken Williams
  • 22,756
  • 10
  • 85
  • 147
  • 3
    I'm guessing that the objects that are loaded in are of classes that provoke this cascade of package-loading ... what is `class(x)` ?? – Ben Bolker Oct 02 '13 at 21:03
  • I'm not at work now, but when I inspected `class`es I couldn't find anything interesting. Just `numeric`s and stuff. I'll check again in the morning. – Ken Williams Oct 03 '13 at 04:10
  • `x` appears to be a saved model (if the file name is relevant). How was `x` created? – mnel Oct 03 '13 at 04:22
  • @BenBolker I updated the question with that info. – Ken Williams Oct 03 '13 at 14:36
  • @mnel Yes, it's a saved model, created by someone who's not around anymore. I don't like that lots of arbitrary code is required (and executed) just by loading these models though, especially since they're not even necessary. – Ken Williams Oct 03 '13 at 14:39
  • next question: do any of the objects you loaded have non-trivial environments??? (`quantreg` loads `SparseM`, but that obviously doesn't explain everything ...) – Ben Bolker Oct 03 '13 at 15:07
  • @BenBolker Not sure how to find out. I'm guessing maybe the `call` object has an environment associated with it, but I can't figure out how to view it. – Ken Williams Oct 04 '13 at 01:50
  • `if (!is.null(e <- environment(callObj))) ls(env=e)` ? – Ben Bolker Oct 04 '13 at 02:20
  • 1
    If I do `str(x)`, I get a pretty large output, and in three places I see things like `.. ..- attr(*, ".Environment")=`. Doing `ls(env=e)` and `class(get('foo',env=e))` (for all values of `foo`) on those three environments shows nothing interesting. The `call` objects (there are two of them) have no associated environments. – Ken Williams Oct 11 '13 at 16:36
  • You need to post a link to the serialized object if you want a specific answer. – IRTFM Oct 30 '13 at 16:20
  • @DWin that's not gonna happen, this is a business object that I can't share. That's why I asked in my original question - "*under what circumstances*" does this happen - in hopes someone could say "oh, that happens when blah blah blah." – Ken Williams Oct 30 '13 at 18:03
  • Do you have a reproducible example? No need to share your business data --> a small example with generated data would be the best. Or use a data frame from your business but replace the real data with some random crap. – lebatsnok Nov 04 '13 at 20:03
  • But the "under what circumstances" part is probably answerable. The model you're saving includes a few environments, and one of these has a complicated `parent.env` (having the aforementioned packages in the family tree). An environment is not complete without its parents, so they have to be loaded too. And a solution could be to (a) find the trouble-maker, and (b) remove it :) (For example setting its parent env to baseenv() or emptyenv() – lebatsnok Nov 04 '13 at 20:10

2 Answers2

5

Ok. This may not be a useful answer (which would need more details) but I think it is at least an aswer to the "under what circumstances.." part.

First of all, I think it is not specific to readRDS but works the same way with any save'd objects that can be load'ed.

The "under what circumstances" part: when the saved object contains an environment having a package/namespace environment as a parent. Or when it contains a function whose environment is a package/namespace environment.

require(Matrix)
foo <- list(
   a = 1,
   b = new.env(parent=environment(Matrix)),
   c = "c")
save(foo, file="foo.rda")
loadedNamespaces()   # Matrix is there!
detach("package:Matrix")
unloadNamespace("Matrix")
loadedNamespaces()   # no Matrix there!
load("foo.rda")
loadedNamespaces()   # Matrix is back again

And the following works too:

require(Matrix)
bar <- list(
   a = 1,
   b = force,
   c = "c")
environment(bar$b) <- environment(Matrix)
save(bar, file="bar.rda")
loadedNamespaces()      # Matrix is there!
detach("package:Matrix")
unloadNamespace("Matrix")
loadedNamespaces()      # no Matrix there!
load("bar.rda")
loadedNamespaces()      # Matrix is back!

I haven't tried but there's no reason why it shouldn't work the same way with saveRDS/readRDS. And the solution: if that does no harm to the saved objects (i.e., if you're sure that the environments are actually not needed), you can remove the parent environments by replacing them e.g. by setting the parent.env to something that makes sense. So using the foo above,

parent.env(foo$b) <- baseenv()
save(foo, file="foo.rda")
loadedNamespaces()        # Matrix is there ....
unloadNamespace("Matrix")
loadedNamespaces()        # no Matrix there ...
load("foo.rda")
loadedNamespaces()        # still no Matrix ...
lebatsnok
  • 6,329
  • 2
  • 21
  • 22
  • I like it. `parent.env` was the piece I was missing. In our case I do think there's loads of stuff we should be able to purge from the parent namespaces. Thanks. – Ken Williams Nov 06 '13 at 13:07
1

One painful workaround I've come up with is to cleanse the object of any environments it had attached to it, by a nasty eval:

sanitizeEnvironments <- function(obj) {
    tc <- textConnection(NULL, 'w')
    dput(obj, tc)
    source(textConnection(textConnectionValue(tc)))$value
}

I can take the old object, run it through this function, then do saveRDS() on it again. Then loading the new object doesn't blow chunks all over my namespace.

Ken Williams
  • 22,756
  • 10
  • 85
  • 147
  • ugh. At least it confirms that the package-loading-triggering stuff is indeed hiding in the environments ...\ – Ben Bolker Oct 30 '13 at 18:45