1

When a data.table secondary index is set on on a numeric vector, it does not seem to allow for subsetting rows using the on = syntax. However, I couldn't see anything in the documentation that would indicate that only character columns can be secondary indices. Is the on = syntax limited to character columns?

library(data.table)
dt <- data.table(A = 1:10, B = letters[1:10])
setindex(dt, A, B)
dt[on = "B", "c"]
dt[on = "A", 3]
Bob
  • 451
  • 1
  • 5
  • 12
  • 1
    I can get `dt[on = "A", .(1)]` to work, along with `dt[on = "B", .("c")]` - not sure why character works when not in a list though. – thelatemail Jul 31 '17 at 00:04
  • That's a great idea! It does indeed work when putting the query in a list. On looking at the documentation, it seems the authors intend queries to be generally be wrapped in a list for secondary indexing. Feel free to submit your answer so I can accept it as correct – Bob Jul 31 '17 at 00:16
  • I will wait to see if some of the more data.table savvy folks pop around with an actual reason why one works and the other doesn't. I think you are right that the list is preferred, but this seems a little strange. – thelatemail Jul 31 '17 at 00:19
  • I should have looked at the "Keys and fast binary search" document more carefully; it says: "On single column key of character type, you can drop the `.()` notation and use the values directly when subsetting, like subset using row names on data.frames." This is a little unusual, since I have normally found indexing on key columns works without the `.()` syntax for numeric columns as well, but apparently this doesn't carry over into secondary indices. – Bob Jul 31 '17 at 00:22
  • 1
    Nice find, the good ol' manual comes into play again! I think you can answer your own question now to reap those internet points! :-) – thelatemail Jul 31 '17 at 00:26

1 Answers1

5

In ?data.table:

i

character, list and data.frame input to i is converted into a data.table internally using as.data.table.

As a result, a join is done, either using a key or just with on=.

The option to skip .() for character columns is also noted in the "Keys and fast binary search" vignette, vignette("datatable-keys-fast-subset"):

On single column key of character type, you can drop the .() notation and use the values directly when subsetting, like subset using row names on data.frames.

Frank
  • 66,179
  • 8
  • 96
  • 180
Bob
  • 451
  • 1
  • 5
  • 12
  • Re the second half of your answer, this has nothing to do with keys or indices, it's just an interface design for `DT[...]`. If you pass numbers without `.()` they are interpreted as row numbers. If you pass a string without `.()`, it is interpreted as a join (allowing the `.()` to be skipped for convenience when typing, I guess). And if you pass a single symbol, it is also interpreted as a join. – Frank Jul 31 '17 at 05:41