Do `data.table` secondary indices have type limitations?

Question

When a data.table secondary index is set on on a numeric vector, it does not seem to allow for subsetting rows using the on = syntax. However, I couldn't see anything in the documentation that would indicate that only character columns can be secondary indices. Is the on = syntax limited to character columns?

library(data.table)
dt <- data.table(A = 1:10, B = letters[1:10])
setindex(dt, A, B)
dt[on = "B", "c"]
dt[on = "A", 3]

I can get `dt[on = "A", .(1)]` to work, along with `dt[on = "B", .("c")]` - not sure why character works when not in a list though. — thelatemail, Jul 31 '17 at 00:04
That's a great idea! It does indeed work when putting the query in a list. On looking at the documentation, it seems the authors intend queries to be generally be wrapped in a list for secondary indexing. Feel free to submit your answer so I can accept it as correct — Bob, Jul 31 '17 at 00:16
I will wait to see if some of the more data.table savvy folks pop around with an actual reason why one works and the other doesn't. I think you are right that the list is preferred, but this seems a little strange. — thelatemail, Jul 31 '17 at 00:19
I should have looked at the "Keys and fast binary search" document more carefully; it says: "On single column key of character type, you can drop the `.()` notation and use the values directly when subsetting, like subset using row names on data.frames." This is a little unusual, since I have normally found indexing on key columns works without the `.()` syntax for numeric columns as well, but apparently this doesn't carry over into secondary indices. — Bob, Jul 31 '17 at 00:22
Nice find, the good ol' manual comes into play again! I think you can answer your own question now to reap those internet points! :-) — thelatemail, Jul 31 '17 at 00:26

score 5 · Accepted Answer · edited Jul 31 '17 at 05:46

5

In ?data.table:

i

character, list and data.frame input to i is converted into a data.table internally using as.data.table.

As a result, a join is done, either using a key or just with on=.

The option to skip .() for character columns is also noted in the "Keys and fast binary search" vignette, vignette("datatable-keys-fast-subset"):

On single column key of character type, you can drop the .() notation and use the values directly when subsetting, like subset using row names on data.frames.

edited Jul 31 '17 at 05:46

Frank

66,179
8
96
180

answered Jul 31 '17 at 00:48

Bob

451
1
5
12

Re the second half of your answer, this has nothing to do with keys or indices, it's just an interface design for `DT[...]`. If you pass numbers without `.()` they are interpreted as row numbers. If you pass a string without `.()`, it is interpreted as a join (allowing the `.()` to be skipped for convenience when typing, I guess). And if you pass a single symbol, it is also interpreted as a join. – Frank Jul 31 '17 at 05:41

Do `data.table` secondary indices have type limitations?

1 Answers1