8

Relevant background info

I've built a little software that can be customized via a config file. The config file is parsed and translated into a nested environment structure (e.g. .HIVE$db = an environment, .HIVE$db$user = "Horst", .HIVE$db$pw = "my password", .HIVE$regex$date = some regex for dates etc.)

I've built routines that can handle those nested environments (e.g. look up value "db/user" or "regex/date", change it etc.). The thing is that the initial parsing of the config files takes a long time and results in quite a big of an object (actually three to four, between 4 and 16 MB). So I thought "No problem, let's just cache them by saving the object(s) to .Rdata files". This works, but "loading" cached objects makes my Rterm process go through the roof with respect to RAM consumption (over 1 GB!!) and I still don't really understand why (this doesn't happen when I "compute" the object all anew, but that's exactly what I'm trying to avoid since it takes too long).

I already thought about maybe serializing it, but I haven't tested it as I would need to refactor my code a bit. Plus I'm not sure if it would affect the "loading back into R" part in just the same way as loading .Rdata files.

Question

Can anyone tell me why loading a previously computed object has such effects on memory consumption of my Rterm process (compared to computing it in every new process I start) and how best to avoid this?

If desired, I will also try to come up with an example, but it's a bit tricky to reproduce my exact scenario. Yet I'll try.

Rappster
  • 12,762
  • 7
  • 71
  • 120
  • 2
    Trying to reproduce a toy example might point you to why this is happening. At least that is how it works for me. – Andrew Redd Oct 31 '11 at 17:08

2 Answers2

8

Its likely because the environments you are creating are carrying around their ancestors. If you don't need the ancestor information then set the parents of such environments to emptyenv() (or just don't use environments if you don't need them).

Also note that formulas (and, of course, functions) have environments so watch out for those too.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • +1 That's a deep insight. Is there some way to spot this - just look for anything but R_EmptyEnv from `parent.env()`? I am interested in checking out if there's anything fishy going on in my own environments. FWIW, I noticed that `?environment` reports that `The replacement function 'parent.env<-' is extremely dangerous...` (see the help for the full warning) and may be removed. – Iterator Oct 31 '11 at 17:30
  • This sounds quite likely. +1 for pointing to a use for `emptyenv()`. – Josh O'Brien Oct 31 '11 at 17:32
  • Thanks, that might really be part of the problem as I was too lazy to always include the `parent=emptyenv()` statement. I will be back with more once I tried this – Rappster Oct 31 '11 at 18:27
  • @Iterator, zapping parents really only applies if you are storing the environments (or objects which contain environments). IF you are just using environments then ancestors would not normally be a problem. – G. Grothendieck Nov 01 '11 at 19:06
3

If it's not reproducible by others, it will be hard to answer. However, I do something quite similar to what you're doing, yet I use JSON files to store all of my values. Rather than parse the text, I use RJSONIO to convert everything to a list, and getting stuff from a list is very easy. (You could, if you want, convert to a hash, but it's nice to have layers of nested parameters.)

See this answer for an example of how I've done this kind of thing. If that works out for you, then you can forego the expensive translation step and the memory ballooning.

(Taking a stab at the original question...) I wonder if your issue is that you are using an environment rather than a list. Saving environments might be tricky in some contexts. Saving lists is no problem. Try using a list or try converting to/from an environment. You can use the as.list() and as.environment() functions for this.

Community
  • 1
  • 1
Iterator
  • 20,250
  • 12
  • 75
  • 111
  • Thanks for the pointer. Thought about JSON objects as well. Initially, I was using lists but from a conceptional point of view I like environments more in that respect as they perfectly map how you would store files in a directory tree setting (i.e. Windows Explorer), ensuring there are no duplicate files (which might happen easily in lists). Also, I found expanding/cutting an existing nested structure is far easier using environments then using lists, but that might also just be due to not doing it in the smartest possible way ;-) – Rappster Oct 31 '11 at 18:32