1

I have a simple question, but I can't figure out a simple solution:

library(data.table)
plouf <- data.table(1:10,letters[1:10])
plouf[V1 %in% c(3,1),V2]

[1] "a" "c"

I would like the output to keep the initial order of the subsetting vector, i.e. "c" "a". What are the possiblities ?

I have

sapply(c(3,1),function(x){plouf[V1 == x,V2]})

but I find it uggly.

edit

I have

setkey(plouf,V1)
plouf[c(3,1),V2]

which is surely the good way for data.table. Still I am curious about what are the solutions

denis
  • 5,580
  • 1
  • 13
  • 40
  • 1
    Using `match` `plouf[,V2[match(c(3, 1), V1)]]#[1] "c" "a"` . The `setkey` option would also do a reordering. In case, you don't want that, then `match` is an option – akrun Jul 24 '19 at 20:38

2 Answers2

2

Using data.table keys will accomplish what you're going for here, the Keys and fast binary search based subset vignette here explains the usage.

library(data.table)
plouf <- data.table(1:10,letters[1:10])

## Set a key
setkey(plouf,V1)
## Use .() syntax for key subsetting to get associated values of V2
plouf[.(c(3,1)),V2]
#[1] "c" "a"
Matt Summersgill
  • 4,054
  • 18
  • 47
  • yes, I had this one a bit after asking. Thanks for the link – denis Jul 24 '19 at 20:48
  • If you want to retain the order of the data, there is no need to set a key. Just use the `on=` argument for an ad-hoc join/lookup. This is mentioned in the Details part of `?data.table` but apparently not the vignette yet. – Frank Jul 24 '19 at 21:04
  • 2
    Oh, didn't notice before, but that is row-number subsetting, not key subsetting. You need to wrap like `.(c(3,1))` or similar. – Frank Jul 24 '19 at 21:26
  • Good catch -- I used the `.()` syntax in my original answer, but confused myself into thinking it wasn't necessary and removed it on a subsequent edit, will revert back now. What syntax would you suggest for the `on` ad-hoc usage, I haven't ever used that method and reading through the docs I've tried a couple methods without any success? – Matt Summersgill Jul 24 '19 at 21:29
  • 1
    maybe something like `data.table(ID=LETTERS[1:10], VAL=1:10)[.(ID=c("C","A")), on=.(ID)]`. see https://stackoverflow.com/a/20057411/1989480 – chinsoon12 Jul 25 '19 at 00:28
2

Here is one option with match that can be used in data.table and in base R as well. Unlike %in%, match returns the position index of the first match and this can be used to get the corresponding elements of the other column 'V2'

plouf[, V2[match(c(3, 1), V1)]]
#[1] "c" "a"

plouf[, match(c(3, 1), V1)] # returns numeric index
#[1] 3 1
plouf[, V1 %in% c(3, 1)] # returns logical vector
#[1]  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

Because thee %in% returns logical vector, when we use this to extract the elements, the elements corresponding to each TRUE value will be extracted i.e. it extracts from 1st and 3rd positions instead of 3rd and 1st

akrun
  • 874,273
  • 37
  • 540
  • 662