R - Difference in subsetting with `subset()` vs. brackets [ ]?

Question

Why might I come up with very different answers for these two lines of code:

    nrow(aql[(aql$`Land Use`=="RESIDENTIAL" & aql$`Location Setting`=="SUBURBAN"),])
[1] 4514

...and...

    nrow(subset(aql, (`Location Setting`=="SUBURBAN" & `Land Use`=="RESIDENTIAL")))
[1] 3527

It's hard to say without seeing the full data, but there are a few unusual instances where things might get messed up when using `subset()` instead of brackets. See here: https://stackoverflow.com/questions/9860090/why-is-better-than-subset — coip, Aug 25 '22 at 23:32

score 0 · Answer 1 · answered Sep 15 '17 at 15:32

0

Its hard to say without a more reproducible example, but it is likely the influence of NA values in the Location Setting or Land Use variables.

The subset function explicitly strips them out, while [ does not.

R does some unintuitive things when comparing NA and TRUE/FALSE values, so be careful for something like this:

NA & FALSE

[1] FALSE

NA & TRUE

[1] NA

answered Sep 15 '17 at 15:32

whopper510

It is intuitive if you consider `NA` as a true or false value. `NA & FALSE` will always be false, but we are not sure about `NA & TRUE`. – qwr Apr 11 '18 at 21:10

1 Answers1