2

While writing convenience functions for subset(), I ran into a strange situation where using equivalent logical statements returns different subsets. So, for example:

dat = data.frame(ttl.stims = c(4,4,8,8), change = c('big', 'small'))
dat
ttl.stims = 4

#logical statements are equivalent
dat$ttl.stims == 4
dat$ttl.stims == ttl.stims

#subset evaluates differently
subset(dat, dat$ttl.stims == 4)
subset(dat, dat$ttl.stims == ttl.stims)

I've been working around this by doing:

index = dat$ttl.stims == ttl.stims
subset(dat, index)

But I'm so curious about why the first two subsets don't produce identical results! Ideas? Thoughts? Pontifications?

machow
  • 1,034
  • 1
  • 10
  • 16
  • 5
    From `?subset`: "Warning: ... in particular the non-standard evaluation of argument ‘subset’ can have unanticipated consequences." – Joshua Ulrich Jun 05 '12 at 02:46
  • possible duplicate of [In R, why is `\[` better than `subset`?](http://stackoverflow.com/questions/9860090/in-r-why-is-better-than-subset) – joran Jun 05 '12 at 03:18

1 Answers1

4

Because inside the call to subset the symbol ttl.stims gets interpreted in the environment of dat, so it becomes (after interpretation) dat$ttl.stims. I predict that the second call to subset returns the entire dataframe.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Ah, I changed the name of `ttl.stims` and it worked. Thanks for the insight--that's really good to know! I definitely need to read more into how R evaluates function calls.. – machow Jun 05 '12 at 15:19
  • In this case all you need to know is that inside `subset` or a `with` call the interpreter looks first at column names for a match, and only if it doesn't find a match does it look to the parent.frame (which might or might not be the `.GlobalEnv`). The use of `with` is likewise considered dangerous inside functions. – IRTFM Jun 05 '12 at 15:28