6

I read in a public-use dataset that created dozens of temporary vectors in the process of building a final dataframe. Since this dataframe will be analyzed as part of a larger process, I plan on sourceing the R script that creates the dataframe, but do not want to leave myself or future users with a cluttered global environment.

I know that I can use ls to list the current objects in my global environment and use rm to remove certain objects, but I'm unsure of how to use those two functions in concert to remove all objects except the dataframe created by a certain script.

To clarify, here is a reproducible example:

Script 1, named "script1.R"

setwd("C:/R/project")
set.seed(12345)
var <- letters
for (i in var) {
  assign(i, runif(1))
}
df <- data.frame(x1 = a, x2 = b, x3 = c)

Script 2

source("script1.r")

It would be easy enough to remove all vectors from the sourced script by some combination of rm, ls with pattern = letters or something like that, but what I want to do is create a general function that removes ALL vectors created by a certain script and only retain the dataframe (in this example, df).

(NOTE: There are similar questions as this here and here, but I feel mine is different in that it is more specific to sourcing and cleaning in the context of a multi-script project).

Update While looking around, the following link gave me a nice work around:

How can I neatly clean my R workspace while preserving certain objects?

Specifically, user @Fojtasek suggested:

I would approach this by making a separate environment in which to store all the junk variables, making your data frame using with(), then copying the ones you want to keep into the main environment. This has the advantage of being tidy, but also keeping all your objects around in case you want to look at them again.

So I could just append the source code that creates the dataframe as follows...

temp <- new.env()
with(temp, {
    var <- letters
for (i in var) {
  assign(i, runif(1))
}
df <- data.frame(x1 = a, x2 = b, x3 = c)
}

... and then just extract the desired dataframe (df) to my global environment, but I'm curious if there are other elegant solutions, or if I'm thinking about this incorrectly.

Thanks.

Community
  • 1
  • 1
mcjudd
  • 1,520
  • 2
  • 18
  • 33
  • Do you actually have the name of the variables you want to delete stored like you do in this example (via `var`)? – Dason Jan 28 '15 at 15:50
  • 2
    I can't test this atm since I'm not in front of my R terminal, but since `ls` returns a vector, you can maybe do an `ls` before and after you `source`, and then iterate through and remove the differences except for the dataframe. – Ken Jan 28 '15 at 15:54
  • @Dason No I do not. They are all created in different ways. I just used that as an example to show how you might have several vectors in your global environment that were used to create a dataframe. – mcjudd Jan 28 '15 at 15:58
  • 1
    Does this help? http://stackoverflow.com/questions/28142088/how-to-exclude-only-the-data-frames-from-the-global-environment-in-r/28142128#28142128 seems to be the same as what you're doing – Rich Scriven Jan 28 '15 at 17:00
  • It might make more sense to create a new environment, import just what you want, and pass the desired object to the global environment, preferably by returning it from a function. – Carl Witthoft Jan 28 '15 at 18:32
  • This question has been answered in this SO post [here](http://stackoverflow.com/questions/28142088/exclude-only-data-frames-from-the-global-environment-in-r) . The post focuses on data frames but the function can be applied to any type of object – rafa.pereira Sep 09 '15 at 10:27

3 Answers3

9

As an alternative approach (similar to @Ken's suggestion from the comments), the following code allows you to delete all objects created after a certain point, except one (or more) that you specify:

freeze <- ls() # all objects created after here will be deleted
var <- letters
for (i in var) {
    assign(i, runif(1))
}
df <- data.frame(x1 = a, x2 = b, x3 = c)
rm(list = setdiff(ls(), c(freeze, "df"))) #delete old objects except df

The workhorse here is setdiff(), which will return a list a list of the items that appear in the first list but not the second. In this case, all items created after freeze except df. As an added bonus, freeze is deleted here as well.

Joe
  • 3,831
  • 4
  • 28
  • 44
  • Needless to say, I'm a fan. :) Thanks for mentioning setdiff()! – Ken Jan 28 '15 at 17:32
  • Aha, exactly what I was looking for! Simple, elegant, and using base R. The part about removing `freeze` as well is really slick. – mcjudd Jan 29 '15 at 02:28
4

This should work.

source(file="script1.R")
rm(list=ls()[!sapply(mget(ls(),.GlobalEnv), is.data.frame)])

Breaking it down:

  1. mget(ls()) gets all the objects in the global environment
  2. !sapply(..., is.data.frame determines which is not a data.frame
  3. rm(list=ls()[..] removes only the objects that are not data.frames
John Paul
  • 12,196
  • 6
  • 55
  • 75
2

I have scripts like this save the result as an RDS file and then open the result in a new session (or alternatively, after clearing everything). That is,

a <- 1
saveRDS(a, file="a.RDS")
rm(list=ls())
a <- readRDS("a.RDS")
a
## [1] 1
Aaron left Stack Overflow
  • 36,704
  • 7
  • 77
  • 142