R data.table - row subsetting behavior - NA values

Question

I have noticed that data.frame and data.table row subsetting differ when it comes to NA values.

Clean code:

DF <- data.frame(COL1 = c(1, 2, NA))

DF[DF$COL1 == 1, ]
DF[DF$COL1 != 1, ]

DT <- data.table::data.table(COL1 = c(1, 2, NA))
DT[COL1 == 1, ]
DT[COL1 != 1, ]

Code with results:

> DF <- data.frame(COL1 = c(1, 2, NA))
> DF[DF$COL1 == 1, ]
[1]  1 NA
> DF[DF$COL1 != 1, ]
[1]  2 NA
> DT <- data.table::data.table(COL1 = c(1, 2, NA))
> DT[COL1 == 1, ]
   COL1
1:    1
> DT[COL1 != 1, ]
   COL1
1:    2

Is there any special reasons for that?

Thanks

Besides the dupe link, there is also more discussion here: http://stackoverflow.com/questions/16239153/dtx-and-dtx-treat-na-in-x-inconsistently — Frank, Dec 14 '16 at 18:48

score 1 · Accepted Answer · answered Dec 14 '16 at 18:36

1

From the help file, ?data.table, under the discussion of i:

integer and logical vectors work the same way they do in [.data.frame except logical NAs are treated as FALSE.

In data.frame, NAs are treated as NA.

answered Dec 14 '16 at 18:36

lmo

37,904
9
56
69

Ok, thanks. If you come across any discussion of it, please share. – Samuel Barbosa Dec 14 '16 at 18:47
Seems that `data.frame`always brings in NA rows; `data.table` never brings. Since NA could be interpreted as "no information", there can be many ways of dealing with it. Both seems a little bit inconsistent to me. In `data.frame`, when I subset with `COL1 != 1`, it excludes rows where value is known to be 1 and keep all others. However when I subset with `COL1 == 1`, it brings me rows with value 1 and rows which value is not known to be 1 or not. – Samuel Barbosa Dec 14 '16 at 18:53

R data.table - row subsetting behavior - NA values

1 Answers1