26

Many intro R books and guides start off with the practice of attaching a data.frame so that you can call the variables by name. I have always found it favorable to call variables with $ notation or square bracket slicing [,2]. That way I can use multiple data.frames without confusing them and/or use iteration to successively call columns of interest. I noticed Google recently posted coding guidelines for R which included the line

1) attach: avoid using it

How do people feel about this practice?

Richard Erickson
  • 2,568
  • 8
  • 26
  • 39
kpierce8
  • 15,977
  • 2
  • 23
  • 25

7 Answers7

25

I never use attach. with and within are your friends.

Example code:

> N <- 3
> df <- data.frame(x1=rnorm(N),x2=runif(N))
> df$y <- with(df,{
   x1+x2
 })
> df
          x1         x2          y
1 -0.8943125 0.24298534 -0.6513271
2 -0.9384312 0.01460008 -0.9238312
3 -0.7159518 0.34618060 -0.3697712
> 
> df <- within(df,{
   x1.sq <- x1^2
   x2.sq <- x2^2
   y <- x1.sq+x2.sq
   x1 <- x2 <- NULL
 })
> df
          y        x2.sq     x1.sq
1 0.8588367 0.0590418774 0.7997948
2 0.8808663 0.0002131623 0.8806532
3 0.6324280 0.1198410071 0.5125870

Edit: hadley mentions transform in the comments. here is some code:

 > transform(df, xtot=x1.sq+x2.sq, y=NULL)
       x2.sq       x1.sq       xtot
1 0.41557079 0.021393571 0.43696436
2 0.57716487 0.266325959 0.84349083
3 0.04935442 0.004226069 0.05358049
Eduardo Leoni
  • 8,991
  • 6
  • 42
  • 49
  • 3
    `transform` is another useful variation on within. – hadley Aug 23 '09 at 14:24
  • 1
    Actually I just noticed that unlike `attach()`, `with()` doesn't "resolve through" functions. First set up `printx <- function { print(x) }`. Now, `with(list(x=42), printx())` fails even though `with(list(x=42), print(x))` and `attach(list(x=42)); printx()` succeed! :( – j_random_hacker Sep 20 '11 at 12:52
13

I much prefer to use with to obtain the equivalent of attach on a single command:

 with(someDataFrame,  someFunction(...))

This also leads naturally to a form where subset is the first argument:

 with(subset(someDataFrame,  someVar > someValue),
      someFunction(...))

which makes it pretty clear that we operate on a selection of the data. And while many modelling function have both data and subset arguments, the use above is more consistent as it also applies to those functions who do not have data and subset arguments.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
8

The main problem with attach is that it can result in unwanted behaviour. Suppose you have an object with name xyz in your workspace. Now you attach dataframe abc which has a column named xyz. If your code reference to xyz, can you guarantee that is references to the object or the dataframe column? If you don't use attach then it is easy. just xyz refers to the object. abc$xyz refers to the column of the dataframe.

One of the main reasons that attach is used frequently in textbooks is that it shortens the code.

Thierry
  • 18,049
  • 5
  • 48
  • 66
  • I have noticed that some textbooks say "don't do this, attach is being used to simplify the examples". – Michelle Jan 26 '12 at 16:01
7

"Attach" is an evil temptation. The only place where it works well is in the classroom setting where one is given a single dataframe and expected to write lines of code to do the analysis on that one dataframe. The user is unlikely to ever use that data again once the assignement is done and handed in.

However, in the real world, more data frames can be added to the collection of data in a particular project. Furthermore one often copies and pastes blocks of code to be used for something similar. Often one is borrowing from something one did a few months ago and cannot remember the nuances of what was being called from where. In these circumstances one gets drowned by the previous use of "attach."

Farrel
  • 10,244
  • 19
  • 61
  • 99
3

I prefer not to use attach(), as it is far too easy to run a batch of code several times each time calling attach(). The data frame is added to the search path each time, extending it unnecessarily. Of course, good programming practice is to also detach() at the end of the block of code, but that is often forgotten.

Instead, I use xxx$y or xxx[,"y"]. It's more transparent.

Another possibility is to use the data argument available in many functions which allows individual variables to be referenced within the data frame. e.g., lm(z ~ y, data=xxx).

Rorschach
  • 31,301
  • 5
  • 78
  • 129
Rob Hyndman
  • 30,301
  • 7
  • 73
  • 85
  • Sometimes I am calling from various data frames and global variables, and this system means never having an incorrect calculation performed. – Michelle Jan 26 '12 at 16:00
3

Just like Leoni said, with and within are perfect substitutes for attach, but I wouldn't completely dismiss it. I use it sometimes, when I'm working directly at the R prompt and want to test some commands before writing them on a script. Especially when testing multiple commands, attach can be a more interesting, convenient and even harmless alternative to with and within, since after you run attach, the command prompt is clear for you to write inputs and see outputs.

Just make sure to detach your data after you're done!

Waldir Leoncio
  • 10,853
  • 19
  • 77
  • 107
2

While I, too, prefer not to use attach(), it does have its place when you need to persist an object (in this case, a data.frame) through the life of your program when you have several functions using it. Instead of passing the object into every R function that uses it, I think it is more convenient to keep it in one place and call its elements as needed.

That said, I would only use it if I know how much memory I have available and only if I make sure that I detach() this data.frame once it is out of scope.

Am I making sense?

Rorschach
  • 31,301
  • 5
  • 78
  • 129
AlexGilgur
  • 21
  • 1