59

I have a Rdata file containing various objects:

 New.Rdata
  |_ Object 1  (e.g. data.frame)
  |_ Object 2  (e.g. matrix)
  |_...
  |_ Object n

Of course I can load the data frame with load('New.Rdata'), however, is there a smart way to load only one specific object out of this file and discard the others?

epo3
  • 2,991
  • 2
  • 33
  • 60
Seb
  • 5,417
  • 7
  • 31
  • 50

5 Answers5

78

.RData files don't have an index (the contents are serialized as one big pairlist). You could hack a way to go through the pairlist and assign only entries you like, but it's not easy since you can't do it at the R level.

However, you can simply convert the .RData file into a lazy-load database which serializes each entry separately and creates an index. The nice thing is that the loading will be on-demand:

# convert .RData -> .rdb/.rdx
e = local({load("New.RData"); environment()})
tools:::makeLazyLoadDB(e, "New")

Loading the DB then only loads the index but not the contents. The contents are loaded as they are used:

lazyLoad("New")
ls()
x # if you had x in the New.RData it will be fetched now from New.rdb

Just like with load() you can specify an environment to load into so you don't need to pollute the global workspace etc.

Simon Urbanek
  • 13,842
  • 45
  • 45
  • 1
    But the look-up will still require a serial access through "New.RData" to get "x", right? So if "x" is at the end of "New.RData", there might be no time savings? Question2: won't there be memory taken up with the other objects encountered as the unserialization process works its way through "New.RData"? – IRTFM Jan 02 '12 at 17:14
  • 1
    No, the lookup just seeks into `New.rdb` at the beginning of `x` and loads only `x`. – Simon Urbanek Jan 02 '12 at 17:49
  • 11
    What are the chances of these functions becoming less internal? – hadley Jan 03 '12 at 18:08
  • This does not work if the data files are larger. – Reeza Feb 09 '21 at 23:14
19

You can use attach rather than load which will attach the data object to the search path, then you can copy the one object you are interested in and detach the .Rdata object.

This still loads everything, but is simpler to work with than loading everything into the global workspace (possibly overwriting things you don't want overwritten) then getting rid of everything you don't want.

Greg Snow
  • 48,497
  • 6
  • 83
  • 110
  • 14
    This answer is good but would be more complete with an example, because detaching the .RData object from the search path is not intuitive. Example to retrieve `someObj` from `someFile.RData`: `attach('someFile.RData'); someObj <- someObj; detach('file:someFile.RData')` – C8H10N4O2 Jun 27 '17 at 13:28
  • 2
    @C8H10N4O2, your example is good (and very explicit). But if you attach in the default position (2) and don't attach anything else before calling detach, then the defaults work and you can just call `detach()` without any arguments and it will detach the file. This is quicker and simpler; your approach is safer. – Greg Snow Jun 27 '17 at 15:14
7

Simon Urbanek's answer is very, very nice. A drawback is that it doesn't seem to work if an object to be saved is too large:

tools:::makeLazyLoadDB(
  local({
    x <- 1:1e+09
   cat("size:", object.size(x) ,"\n")
   environment()
  }), "lazytest")
size: 4e+09 
Error: serialization is too large to store in a raw vector

I'm guessing that this is due to a limitation of the current implementation of R (I have 2.15.2) rather than running out of physical memory and swap. The saves package might be an alternative for some uses, however.

Mars
  • 8,689
  • 2
  • 42
  • 70
5

A function is useful to extract a single object without loading everything in the RData file.

extractorRData <- function(file, object) {
      #' Function for extracting an object from a .RData file created by R's save() command
      #' Inputs: RData file, object name
      E <- new.env()
      load(file=file, envir=E)
      return(get(object, envir=E, inherits=F))
    }

See full answer here. https://stackoverflow.com/a/65964065/4882696

GGAnderson
  • 1,993
  • 1
  • 14
  • 25
0

This blog post gives an a neat practice that prevents this sort of issue in the first problem. The gist of it is to use saveRDS(), loadRDS() functions instead of the regular save(), load() functions.