I read in a public-use dataset that created dozens of temporary vectors in the process of building a final dataframe. Since this dataframe will be analyzed as part of a larger process, I plan on source
ing the R script that creates the dataframe, but do not want to leave myself or future users with a cluttered global environment.
I know that I can use ls
to list the current objects in my global environment and use rm
to remove certain objects, but I'm unsure of how to use those two functions in concert to remove all objects except the dataframe created by a certain script.
To clarify, here is a reproducible example:
Script 1, named "script1.R"
setwd("C:/R/project")
set.seed(12345)
var <- letters
for (i in var) {
assign(i, runif(1))
}
df <- data.frame(x1 = a, x2 = b, x3 = c)
Script 2
source("script1.r")
It would be easy enough to remove all vectors from the source
d script by some combination of rm
, ls
with pattern = letters
or something like that, but what I want to do is create a general function that removes ALL vectors created by a certain script and only retain the dataframe (in this example, df
).
(NOTE: There are similar questions as this here and here, but I feel mine is different in that it is more specific to sourcing and cleaning in the context of a multi-script project).
Update While looking around, the following link gave me a nice work around:
How can I neatly clean my R workspace while preserving certain objects?
Specifically, user @Fojtasek suggested:
I would approach this by making a separate environment in which to store all the junk variables, making your data frame using with(), then copying the ones you want to keep into the main environment. This has the advantage of being tidy, but also keeping all your objects around in case you want to look at them again.
So I could just append the source code that creates the dataframe as follows...
temp <- new.env()
with(temp, {
var <- letters
for (i in var) {
assign(i, runif(1))
}
df <- data.frame(x1 = a, x2 = b, x3 = c)
}
... and then just extract the desired dataframe (df
) to my global environment, but I'm curious if there are other elegant solutions, or if I'm thinking about this incorrectly.
Thanks.