3

After reading data.table FAQ (section 1.5), I had an impression that all three ways of addressing the column are more or less equivalent. But at least the output of [, mycol, with=FALSE] is quite different from $mycol and [[mycol]]:

dt1 <- fread(
  " id,colA,colB
   id1,3,xxx
   id2,0,zzz
   id3,NA,yyy
   id4,0,aaa
     ")

dt1$colA <- factor(dt1$colA)

myvar="colA"

dt1$colA
# [1] 3    0    <NA> 0   
# Levels: 0 3

dt1[[myvar]]
# [1] 3    0    <NA> 0   
# Levels: 0 3

dt1[, myvar, with=FALSE]
# colA
# 1:    3
# 2:    0
# 3:   NA
# 4:    0

So, what is exact difference between those three approaches? Can I assume that $mycol and [[mycol]] are always identical? Why [, mycol, with=FALSE] "loses" factors?

Thanks in advance.

Vasily A
  • 8,256
  • 10
  • 42
  • 76

1 Answers1

3

First part of your question, the difference between $ and [[, has been covered before in this question:

Indexing by [ is similar to atomic vectors and selects a list of the specified element(s).

Both [[ and $ select a single element of the list. The main difference is that $ does not allow computed indices, whereas [[ does. x$name is equivalent to x[["name", exact = FALSE]]. Also, the partial matching behavior of [[ can be controlled using the exact argument.

The notation dt1[, ..myvar] in data.table produces a data table with the columns evaluated in myvar. The result is a one-column data table, and the class of that column is factor.

The data frame equivalent would be: as.data.frame(dt1)[, myvar, drop = FALSE].

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
Blue Magister
  • 13,044
  • 5
  • 38
  • 56