6

I often find myself transferring workspaces to different scratch drives etc when one computing system is down/busy, or, I want to run two long-winded packages simultaneously to save time and loading the same workspace twice in different places.

Because of this, I'd really love a way to see the different objects between workspaces and a way to combine them, adding only the new, changed or updated workspace objects to a similar workspace. This would be extremely useful for me.

So far I am relying on manual note-taking and getting befuddled with my scribbles two weeks down the line. I really just want to learn so good working practices and habits that make this sort of this easier.

Generally I would really like to learn more about workspace management and how experienced users keep workspaces for long, ongoing projects comprehensive and tidy. I often use Rstudio but working remotely or using our HPC system it can be a bit laggy and clunky so I tend to use command line and interactive sessions.

I think maybe making lists of objects might be the key, but I'd like to be able to annotate things more easily, maybe with the data and parameters used to make the object etc.

Thanks.

jksl
  • 323
  • 5
  • 13

3 Answers3

4

I think one needs to build ones own function here doing the following:

  • loading the workspaces one after one, using:

    load()
    
  • renaming each element of the workspace to prevent overriding when loading another workspace or putting it into a list

  • checking the timestamp of the workspaces with:

    file.info()
    
  • and keeping only the newest objects, which are then to be saved in some up-to-date workspace

Example:

for(i in 1:10){
    dummy <- rnorm(1)
    Sys.sleep(1.3)
    save(dummy,file=paste("test",i,".Rdata",sep=""))
}

DUMMY <- list()
timestamps <- NULL

for(i in 1:10){
    filename <- paste("test",i,".Rdata",sep="")
    load(filename)
    DUMMY[[i]] <- dummy
    timestamps[i] <- file.info(filename)$mtime
}

uptodate <- unlist(timestamps)==max(unlist(timestamps))
dummy <- unlist(DUMMY[uptodate])
save(dummy,file="uptodate.Rdata")
petermeissner
  • 12,234
  • 5
  • 63
  • 63
  • Yes, great! I did not know about file.info. It does not work for objects within workspaces, is there an equivalent for that? This would be especially goo if there an easy way to add suffixes/prefixes to every file in a workspace. I think that would have to be done separately before loading?. maybe I will browse and comment again if I can find a nice way. – jksl Oct 06 '12 at 15:14
  • "[...] s there an equivalent for that?" No, I don't think so. objects do note have a creation/modification attribute as far as I know. Maybe you can add an object to the workspaces containing these information (via date() e.g.). – petermeissner Oct 06 '12 at 15:36
3

I think the key thing is to load your workspaces into separate environments, then figure out how you want to merge them (if at all).

First, let's make some objects to save.

set.seed(1)
a <- data.frame(1:10, 1:10)
b <- rnorm(10)

One way to keep track of when an object was created, is to set an attribute. The downside is that you have to remember to update it when you update your object. (See the last part of the post for alternatives)

d <- structure(data.frame(b), updated=Sys.time())
attr(d, 'updated')
#[1] "2012-10-06 12:34:06 CDT"

You can assign the current time to a variable just before saving the workspace to know when you saved it (file.info that PeterM suggested may be a better alternative)

updated <- Sys.time() 
dir.create('~/tmp') # create a directory to save workspace in.
save.image('~/tmp/ws1.RData')

d[1, 1] <- 1 #make a change to `d`
attr(d, "updated") <- Sys.time() # don't forget to update the `updated` attribute
e <- b * a # add a new object
updated <- Sys.time()
save.image('~/tmp/ws2.RData')

Now clear the workspace, and load the workspaces. But, instead of loading them into the .GlobalEnv, load them into their own environments

rm(list=ls(all=TRUE)) # clear .GlobalEnv
w1 <- new.env()
w2 <- new.env()
load('~/tmp/ws1.RData', envir=w1)
load('~/tmp/ws2.RData', envir=w2)

> ls(w1)
[1] "a"       "b"       "d"       "updated"
> ls(w2)
[1] "a"       "b"       "d"       "e"       "updated"

> with(w1, updated)
[1] "2012-10-06 12:34:09 CDT"
> with(w2, updated)
[1] "2012-10-06 12:35:02 CDT"

> attr(w1$d, 'updated')
[1] "2012-10-06 12:34:06 CDT"
> attr(w2$d, 'updated')
[1] "2012-10-06 12:35:02 CDT"

You may be interested in a function like .ls.objects

> .ls.objects(pos=w1)
              Type Size PrettySize Rows Columns
a       data.frame  872    [1] 872   10       2
b          numeric  168    [1] 168   10      NA
d       data.frame 1224   [1] 1224   10       1
updated    POSIXct  312    [1] 312    1      NA
> .ls.objects(pos=w2)
              Type Size PrettySize Rows Columns
a       data.frame  872    [1] 872   10       2
b          numeric  168    [1] 168   10      NA
d       data.frame 1224   [1] 1224   10       1
e       data.frame 1032   [1] 1032   10       2
updated    POSIXct  312    [1] 312    1      NA

You could use a custom wrapper around assign to keep track of when objects were updated.

myAssign <- function(x, value, ...) {
  attr(value, "updated") <- Sys.time()
  assign(x, value, ...)
}

> myAssign("b", w1$b[1:2], pos=w1)
> w1$b
[1] -0.6264538  0.1836433
attr(,"updated")
[1] "2012-10-06 12:44:55 CDT"

Finally, if you want to get fancy, you can make an active binding so that your object always gets an updated updated attribute whenever it changes.

f <- local({
  delayedAssign('x', stop('object not found'))
  function(v) {
    if (!missing(v)) x <<- structure(v, updated=Sys.time())
    x
  }
})
makeActiveBinding('ab', f, .GlobalEnv)
> ab # Error, nothing has been assigned to it yet
Error in function (v)  : object not found
> ab <- data.frame(1:10, y=rnorm(10))
> attr(ab, 'updated')
[1] "2012-10-06 12:46:53 CDT"
> ab <- data.frame(10:1, y=rnorm(10))
> attr(ab, 'updated')
[1] "2012-10-06 12:47:04 CDT"
Community
  • 1
  • 1
GSee
  • 48,880
  • 13
  • 125
  • 145
  • 1
    +1 The other answers aren't wrong but I definitely think using environments is the cleanest way to accomplish the goal at hand. – Dason Oct 06 '12 at 18:02
  • I'm going to work through these examples. I wasn't aware of environments. Thanks – jksl Oct 06 '12 at 19:20
  • 1
    @jksl, seealso `?environment`, `?get`, `?assign`. Also, you can use `as.list` on an environemnt to convert it to a list which may be something you're more used to working with. – GSee Oct 06 '12 at 19:39
1

I can answer part of your question, but leaving the rest to others on SO.

Assuming your workspace has many objects and before you quit R, save the workspace and rename it to say .RData to work1.RData. If you are on linux, try this renaming your file:

mv .RData work1.RData

Then you open a new R session, create as many objects as you like and save it as before. You may rename this workspace if you want to carry it to other systems.

Now you have two workspace .RData binary files. You can load them into a single current workspace using

load ("work1.RData")

and after, check the loaded objects in your workspace like this

 ls() 
 objects()

Also save.image() will be useful in this case.

HTH

Sathish
  • 12,453
  • 3
  • 41
  • 59
  • Thank you, this is fairly similar to what I already do which is, after large enough changes or functions run, I do save.image(file="myData_1.X.RData") in increments, but often the vast majority of the objects are fairly similar with a select few updated, re run or new. I just wish ls() in R had more details avaliable. If I do something similar in 3 different places, then a nice way to just combine the changged objects is not quite as simple as suggested above. I guess I should be writing objects out as they are created, making a datestamp in the linux info. – jksl Oct 06 '12 at 17:40
  • JUst realised that when you import saved out objects, the attributes aren't retained! – jksl Oct 06 '12 at 21:22