This post concerns subsetting data using package data.table based on a compound condition including a logical AND operator, in particular differences in results obtained with & vs &&.
Environment: R version 3.2.1 (2015-06-18), x86_64-w64-mingw32/x64 (64-bit), Windows 10 Pro, data.table 1.9.4.
I’m subsetting data.table used in a regression call; details of the model are suppressed below, but the data clause of the call is reproduced in full.
lm( y ~ u + v + w, data=DT[condo != 1 &<&> apt != 1] )
Inclusion of the second &
(in angle brackets) gives an alternate form of the expression.
Data.table DT has approx. 25,000 rows. Variables condo
and apt
are never-null dummies taking values in {0,1}
. As it turns out in the instance I’m working on, variable apt
is always 0
.
Using a single &
selects rows of DT as desired, excluding rows where condo == 1
. When both ampersands &&
are used, however, no rows are excluded and the regression is run against all of DT.
So my question(s): Why does this happen? How is package data.table processing the i condition against the rows of DT? Does the distinguished behavior of &&
with respect to condo[1]
and apt[1]
explain the observed behavior? (In the first row of the data.table, condo = 0
and apt = 0
.)
And a bonus question: Under what conditions should a condition such as condo != 1
be written as condo != 1L
, given R’s storage of (undeclared) ints as doubles? This isn’t just an idle question; data subsetting based on the values of dummies arises frequently in my work.