11

How can I dynamically lookup multiple fields and add by reference using character vector variable as argument. In below case I want to lookup two columns and get rid of i. prefix in them. Of course they can override already existing columns with the same name.

library(data.table)
set.seed(1)
ID <- data.table(id = 1:3, meta = rep(1,3), key = "id")
JN <- data.table(idd = sample(ID$id, 3, FALSE), value = sample(letters, 3, FALSE), meta = rep(1,3), key = "idd")
select <- c("value","meta") # my fields to lookup
j.lkp <- call(":=", select, lapply(paste0("i.",select), as.symbol))
j.lkp
# `:=`(c("value", "meta"), list(i.value, i.meta))
ID[JN, eval(j.lkp)]
# Error in eval(expr, envir, enclos) : could not find function "i.value"
ID[JN, `:=`(c("value", "meta"), list(i.value, i.meta))]
#    id meta value
# 1:  1    1     x
# 2:  2    1     v
# 3:  3    1     f

I'm aware of similar question but this one asks for vectorized argument during join and directly building call for j.
edit: I'm aware I can do it using .SDcols but then I cannot perform this by reference

Community
  • 1
  • 1
jangorecki
  • 16,384
  • 4
  • 79
  • 160
  • If you're replacing the entire column (as opposed to some matched subset of rows), I don't think modifying by reference does a lot for you. A `data.frame` is also a list of pointers to vectors; and I think they are modified in roughly the same way, like if you did `ID$value <- JN[match(idd,ID$id),value]`. The matching may be faster with `data.table` because you've set the keys, though. – Frank May 26 '15 at 21:10
  • Actually, scratch that, maybe you could illustrate what you mean about doing this with `.SDcols`... I don't think I follow. – Frank May 26 '15 at 21:12
  • Some kind of **first join, then subset**: `JN[ID][, setnames(.SD,"i....","..."),.SDcols=c(...)]`. Good point on matching whole column but I don't to sacrifice join speed (will do it also on composite key). – jangorecki May 26 '15 at 21:22
  • 1
    I guess it would be nice if we had additional shortcuts for columns and column names on both the left and right hand side of `j`, like `ID[JN,(.SDcols):=i.SD,.SDcols=select]` – Frank May 26 '15 at 21:41
  • 1
    On that last point: http://stackoverflow.com/questions/26807217/ – Frank May 26 '15 at 21:57

3 Answers3

9

This seems to be the most straightforward way to me:

ID[JN, (select) := mget(paste0('i.', select))]
eddi
  • 49,088
  • 6
  • 104
  • 155
  • `mget` seems to search in the wrong place. I get `Error: value for ‘i.value’ not found` on 1.9.4. Probably need to set the right environment. – Frank May 26 '15 at 23:09
  • 2
    Try with 1.9.5 - that's what I'm using – eddi May 26 '15 at 23:15
  • 1
    Okay, I can see that that issue has been fixed: https://github.com/Rdatatable/data.table/issues/994 Too much hassle to update the package at the moment (having to update rtools and such). – Frank May 26 '15 at 23:19
  • @Frank I believe you should be able to build package on separate machine and just install from compiled package (?) – jangorecki May 27 '15 at 08:20
  • `DT1[DT2, a:= get("i.b")]` uses much much more memory than `DT1[DT2, a:= i.b]` in my machine, am I doing something wrong? – user3226167 Apr 17 '18 at 12:05
6

In recent development version it has been made much easier

ID[JN, (select) := .list_of_fields,
   env=list(.list_of_fields=as.list(paste0('i.', select)))]

OLD solution before 1.14.1

Instead of mget or eval-parse there is still possibility to build the lookup call. While the mget is the most user friendly, this one is both flexible and actually corresponds to building the j expression.
Solution wrapped into batch.lookup helper function taking character vector of column names to lookup.

    library(data.table)
    set.seed(1)
    ID <- data.table(id = 1:3, meta = rep(1,3), key = "id")
    JN <- data.table(idd = sample(ID$id, 3, FALSE), value = sample(letters, 3, FALSE), meta = rep(1,3), key = "idd")
    select <- c("value","meta") # my fields to lookup
    batch.lookup = function(x) {
        as.call(list(
            as.name(":="),
            x,
            as.call(c(
                list(as.name("list")),
                sapply(x, function(x) as.name(paste0("i.",x)), simplify=FALSE)
            ))
        ))
    }
    batch.lookup(select)
    #`:=`(c("value", "meta"), list(value = i.value, meta = i.meta))
    ID[JN, eval(batch.lookup(select))][]
    #   id meta value
    #1:  1    1     x
    #2:  2    1     v
    #3:  3    1     f

To be fair this answer actually address call construction issue described by me as OP.

jangorecki
  • 16,384
  • 4
  • 79
  • 160
5

Here's the crude way:

myj <- parse(text=paste0("`:=`(",paste0(select,"=i.",select,collapse=","),")"))
ID[JN,eval(myj)]
#    id meta value
# 1:  1    1     x
# 2:  2    1     v
# 3:  3    1     f
Frank
  • 66,179
  • 8
  • 96
  • 180
  • 1
    @jangorecki There may be. I'm not well versed in `symbols`, `calls`, etc. and don't know what advantage that has over parsing characters (which is the format `select` starts in). – Frank May 26 '15 at 21:27
  • Memory usage does not explode using this method. Unlike `get()` method: `DT1[DT2, a:= get("i.b")]` – user3226167 Apr 19 '18 at 01:39
  • @user3226167 Yeah, `get` and `mget` may have some baggage in terms of loading stuff that might not be needed. Try `ID[JN, ls()]; ID[JN, {get("i.value"); ls()}]`. I don't really know the details, though. Jan's answer is a better way of building a call to eval than this one, fwiw. – Frank Apr 19 '18 at 02:02