1

I have noticed that data.frame and data.table row subsetting differ when it comes to NA values.

Clean code:

DF <- data.frame(COL1 = c(1, 2, NA))

DF[DF$COL1 == 1, ]
DF[DF$COL1 != 1, ]

DT <- data.table::data.table(COL1 = c(1, 2, NA))
DT[COL1 == 1, ]
DT[COL1 != 1, ]

Code with results:

> DF <- data.frame(COL1 = c(1, 2, NA))
> DF[DF$COL1 == 1, ]
[1]  1 NA
> DF[DF$COL1 != 1, ]
[1]  2 NA
> DT <- data.table::data.table(COL1 = c(1, 2, NA))
> DT[COL1 == 1, ]
   COL1
1:    1
> DT[COL1 != 1, ]
   COL1
1:    2

Is there any special reasons for that?

Thanks

1 Answers1

1

From the help file, ?data.table, under the discussion of i:

integer and logical vectors work the same way they do in [.data.frame except logical NAs are treated as FALSE.

In data.frame, NAs are treated as NA.

lmo
  • 37,904
  • 9
  • 56
  • 69
  • Ok, thanks. If you come across any discussion of it, please share. – Samuel Barbosa Dec 14 '16 at 18:47
  • Seems that `data.frame`always brings in NA rows; `data.table` never brings. Since NA could be interpreted as "no information", there can be many ways of dealing with it. Both seems a little bit inconsistent to me. In `data.frame`, when I subset with `COL1 != 1`, it excludes rows where value is known to be 1 and keep all others. However when I subset with `COL1 == 1`, it brings me rows with value 1 and rows which value is not known to be 1 or not. – Samuel Barbosa Dec 14 '16 at 18:53