0

Today i was confronted with a bug in my code due to a dataframe subset operation. I would like to know if the problem i found is a bug or if i am violating R semantics.

I am running a RHEL x86_64 with an R 2.15.2-61015 (Trick or Treat). I am using the subset operation from the base package.

The following code should be reproducible and it was run on a clean R console initiated for the purpose of this test.

>teste <-data.frame(teste0=c(1,2,3),teste1=c(3,4,5))
>teste0<-1
>teste1<-1

>subset(teste,teste[,"teste0"]==1 & teste[,"teste1"]==1)
[1] teste0 teste1
<0 rows> (or 0-length row.names)

>subset(teste,teste[,"teste0"]==teste0 & teste[,"teste1"]==teste1)
teste0 teste1
1      1      3
2      2      4
3      3      5

However, if i run the logical code outside the subset operation:

>teste[,"teste0"]==teste0 & teste[,"teste1"]==teste1
[1] FALSE FALSE FALSE

I would expect that both subset operations would yield an empty dataframe. However, the second one returns the complete dataframe. Is this a bug or am I missing something about R environments and namespaces ?

Thank you for your help, Miguel

Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
mmgm
  • 282
  • 2
  • 11

1 Answers1

5

In this statement:

subset(teste,teste[,"teste0"]==teste0 & teste[,"teste1"]==teste1)

teste0 means teste$teste0. Same for teste1.

In this statement:

teste[,"teste0"]==teste0 & teste[,"teste1"]==teste1

teste0 and teste1 are the vectors that you have defined above (not members of the data frame).

Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
  • @Mathew Thanks! Can you suggest me a source where I could read about this in detail ? – mmgm Nov 21 '12 at 19:34
  • See help(subset), especially the examples. – Matthew Lundberg Nov 21 '12 at 19:35
  • @Mathew well that makes me feel stupid... Basically the vectors that i had defined changed their nature in the subset function. What i want to know is when does that happen in general in R. Not only in this particular case. Cause i gather this is something quite common ? Thanks again. – mmgm Nov 21 '12 at 19:40
  • 2
    @mmgm You might start by reading about how [scope](http://cran.r-project.org/doc/manuals/r-release/R-intro.html#Scope) works in R, and also [this](http://stackoverflow.com/a/9863081/324364) question might be helpful. – joran Nov 21 '12 at 19:55
  • "In general" this doesn't happen in R. `subset` uses non-standard evaluation, i.e. it captures its input an evaluates it itself, using slightly different rules. The behaviour of `subset` is explained very well here: https://github.com/hadley/devtools/wiki/Evaluation – pete Nov 21 '12 at 20:46
  • 4
    @mmgm: The `subset` function creates a local environment where column names are added to the search path and do not need to be quoted. That is known as "non-standard evaluation", and it actually happens quite a bit. Even typing `help(subset)` is an example because the "standard" call would be `help('subset')`. Similar behavior can be seen with the use of `with`, `within`, and `transform`. – IRTFM Nov 21 '12 at 20:49