1

Why might I come up with very different answers for these two lines of code:

    nrow(aql[(aql$`Land Use`=="RESIDENTIAL" & aql$`Location Setting`=="SUBURBAN"),])
[1] 4514

...and...

    nrow(subset(aql, (`Location Setting`=="SUBURBAN" & `Land Use`=="RESIDENTIAL")))
[1] 3527
Conner M.
  • 1,954
  • 3
  • 19
  • 29
  • It's hard to say without seeing the full data, but there are a few unusual instances where things might get messed up when using `subset()` instead of brackets. See here: https://stackoverflow.com/questions/9860090/why-is-better-than-subset – coip Aug 25 '22 at 23:32

1 Answers1

0

Its hard to say without a more reproducible example, but it is likely the influence of NA values in the Location Setting or Land Use variables.

The subset function explicitly strips them out, while [ does not.

R does some unintuitive things when comparing NA and TRUE/FALSE values, so be careful for something like this:

NA & FALSE

[1] FALSE

NA & TRUE

[1] NA

whopper510
  • 487
  • 1
  • 5
  • 11
  • It is intuitive if you consider `NA` as a true or false value. `NA & FALSE` will always be false, but we are not sure about `NA & TRUE`. – qwr Apr 11 '18 at 21:10