Naming conflicts in R when using attach

Question

I feel as if constantly in R, I get weird naming conflicts between attached dataframes and other objects, attaches/detaches not working as expected (just had two copies of the same dataframe attached, not even sure if they were identical or not) and a whole host of softly typed language specific issues. Code that worked an hour ago suddenly produces new errors etc.

Is there a best practice for dealing with this sort of stuff? Am I missing efficiency if I stick to naming dataframes with single letters and then not attaching at all?

You should definitely not be using attach() at all; it's usually a recipe for confusion. — joran, Oct 21 '14 at 04:13
The reason you don't need attach(), is that often many commands (like `lm`) have a `data=` argument where you can pass the data.frame name and R will resolve the variables in your formula within that data.frame, or you can use a function like `with()` to avoid having to re-type the data.frame name a much of times. At the very least, be sure to use `detach()` to remove every data.frame you `attach()` as soon as you are done with it. — MrFlick, Oct 21 '14 at 04:54

Richie Cotton · Accepted Answer · 2014-10-21T07:08:45.520

attaches/detaches (sic) not working as expected

As mentioned by joran and BondedDust, using attach is always a bad idea, because it causes silly, obscure bugs like you found.

naming dataframes with single letters

Don't do this either! Give you variables meaningful names, so that your code is easier to understand when you come back to it six months later.

If your problem is that you don't like repeatedly typing the name of a data frame to access columns, then use functions with special evaluation that avoid that need.

For example,

some_sample_data <- data.frame(x = 1:10, y = runif(10))

Subsetting

Repeated typing, hard work:

some_sample_data[some_sample_data$x > 3 & some_sample_data$y > 0.5, ]

Easier alternative using subset:

subset(some_sample_data, x > 3 & y > 0.5)

Reordering

Repeated typing, hard work:

order_y <- order(some_sample_data$y)
some_sample_data[order_y, ]

Easier using arrange from plyr:

arrange(some_sample_data, y)

Transforming

Repeated typing, hard work:

some_sample_data$z <- some_sample_data$x + some_sample_data$y

Easier using with, within or mutate (the last one from plyr):

some_sample_data$z <- with(some_sample_data, x + y)
some_sample_data <- within(some_sample_data, z <- x + y)
some_sample_data <- mutate(some_sample_data, z = x + y)

Modelling

As mentioned by MrFlick, many functions, particularly modelling functions, have a data argument that lets you avoid repeating the data name.

Repeated typing, hard work:

lm(some_sample_data$y ~ some_sample_data$x)

Using a data argument:

lm(y ~ x, data = some_sample_data)

You can see all the functions in the stats package that have a data argument using:

library(sig)
stats_sigs <- list_sigs(pkg2env(stats))
Filter(function(fn) "data" %in% names(fn$args), stats_sigs)

It should be mentioned that generally `subset()` is a bad idea in non-interactive, programmatic use: http://stackoverflow.com/questions/9860090/in-r-why-is-better-than-subset — landroni, May 27 '16 at 09:47

score 1 · Answer 2 · answered Oct 21 '14 at 05:59

It is better to use a new environment for a series of data. For example, I normally create an e environment with this command.

e <- new.env()

Then you can access the individuals in the environment with e$your_var.

The other benefit:

You can use eapply on the element of environment.
ls(e)
rm(list=e)
It is avoid the conflict between your local variable and function variable that you want to create 5 ...

Naming conflicts in R when using attach

2 Answers2

Linked