1

I feel as if constantly in R, I get weird naming conflicts between attached dataframes and other objects, attaches/detaches not working as expected (just had two copies of the same dataframe attached, not even sure if they were identical or not) and a whole host of softly typed language specific issues. Code that worked an hour ago suddenly produces new errors etc.

Is there a best practice for dealing with this sort of stuff? Am I missing efficiency if I stick to naming dataframes with single letters and then not attaching at all?

Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
Benjamin Lindqvist
  • 4,300
  • 3
  • 18
  • 19
  • 15
    You should definitely not be using attach() at all; it's usually a recipe for confusion. – joran Oct 21 '14 at 04:13
  • 1
    NEVER USE ATTACH! Distrust any books that use it. – IRTFM Oct 21 '14 at 04:49
  • The reason you don't need attach(), is that often many commands (like `lm`) have a `data=` argument where you can pass the data.frame name and R will resolve the variables in your formula within that data.frame, or you can use a function like `with()` to avoid having to re-type the data.frame name a much of times. At the very least, be sure to use `detach()` to remove every data.frame you `attach()` as soon as you are done with it. – MrFlick Oct 21 '14 at 04:54

2 Answers2

6

attaches/detaches (sic) not working as expected

As mentioned by joran and BondedDust, using attach is always a bad idea, because it causes silly, obscure bugs like you found.

naming dataframes with single letters

Don't do this either! Give you variables meaningful names, so that your code is easier to understand when you come back to it six months later.


If your problem is that you don't like repeatedly typing the name of a data frame to access columns, then use functions with special evaluation that avoid that need.

For example,

some_sample_data <- data.frame(x = 1:10, y = runif(10))

Subsetting

Repeated typing, hard work:

some_sample_data[some_sample_data$x > 3 & some_sample_data$y > 0.5, ]

Easier alternative using subset:

subset(some_sample_data, x > 3 & y > 0.5)

Reordering

Repeated typing, hard work:

order_y <- order(some_sample_data$y)
some_sample_data[order_y, ]

Easier using arrange from plyr:

arrange(some_sample_data, y)

Transforming

Repeated typing, hard work:

some_sample_data$z <- some_sample_data$x + some_sample_data$y

Easier using with, within or mutate (the last one from plyr):

some_sample_data$z <- with(some_sample_data, x + y)
some_sample_data <- within(some_sample_data, z <- x + y)
some_sample_data <- mutate(some_sample_data, z = x + y)

Modelling

As mentioned by MrFlick, many functions, particularly modelling functions, have a data argument that lets you avoid repeating the data name.

Repeated typing, hard work:

lm(some_sample_data$y ~ some_sample_data$x)

Using a data argument:

lm(y ~ x, data = some_sample_data)

You can see all the functions in the stats package that have a data argument using:

library(sig)
stats_sigs <- list_sigs(pkg2env(stats))
Filter(function(fn) "data" %in% names(fn$args), stats_sigs)
Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
1

It is better to use a new environment for a series of data. For example, I normally create an e environment with this command.

e <- new.env()

Then you can access the individuals in the environment with e$your_var.

The other benefit:

  1. You can use eapply on the element of environment.
  2. ls(e)
  3. rm(list=e)
  4. It is avoid the conflict between your local variable and function variable that you want to create 5 ...
user1436187
  • 3,252
  • 3
  • 26
  • 59