-1

I have have the following code snippet:

data[match(tmp$key, data$key),][[name]] <- all_tmp[[name]]

It works for migrating values from a variable name containing the column name matching on key from a data.table tmp to a data.table data.

However, it does so only on the first occurrence of key, as this is a limitation of the match() function. The few posts I found on SO that use data.table were quite dated, so I am concerned this is no longer relevant to the latest version of data.table. Other posts did not use data.table.

Importantly, I want to reference the column name using a variable name as opposed to verbatim.

If it was verbatim column "name" I suppose the following would work:

data[all_tmp, on="key", name:=i.name]

Source: https://stackoverflow.com/a/54568079/1515117

Thanks for the help.

Vince
  • 3,325
  • 2
  • 23
  • 41
  • `data[all_tmp, on="key", (name) := get(paste0("i.", name)) ]` or something like that? If you can provide sample data this will be easier to debug. – thelatemail Apr 14 '21 at 01:15

1 Answers1

1

You can use get to grab the i.name variable programmatically in the update join, and stay within standard data.table join operations. Example data and code:

library(data.table)
data <- data.table(snp.gene.key=1:5, dval = letters[1:5])
all_tmp <- data.table(snp.gene.key=1:3, dval=letters[11:13])
setkey(data, snp.gene.key)
setkey(all_tmp, snp.gene.key)

data
#   snp.gene.key dval
#1:            1    a
#2:            2    b
#3:            3    c
#4:            4    d
#5:            5    e

Then specify (name) on the RHS of the := assignment so it is interpreted rather than treated literally, along with using get on the LHS to grab the variable you want for the update join.

name <- "dval"
data[all_tmp, (name) := get(paste0("i.", name)) ]
 
data
#   snp.gene.key dval
#1:            1    k
#2:            2    l
#3:            3    m
#4:            4    d
#5:            5    e
thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • Thanks for the `paste/get` tip. Super useful! Doesn't work, ie columns not updating. Here is more specific code: `data[all_tmp, on="snp.gene.key", (ann$name) := get(paste0("i.",ann$name))]`. Note I am using a list to store the column name, but that doesn't seem problematic. The key is "snp.gene.key" which may be the issue. I tried to set the key in `setDT(data, key="snp.gene.key")` and `setDT(all_tmp, key="snp.gene.key")` but also didn't work. I will provide a sample soon. – Vince Apr 14 '21 at 01:40
  • 1
    @Vince - Check the updated code example. It definitely does update when using a keyed join as you describe. There might be something funny in your specific case, but a sample will clear that up for sure. (I didn't downvote you if you're wondering). – thelatemail Apr 14 '21 at 01:57
  • It works. It was an issue in how I sanity check whether it worked. Earlier in the code I create an additional column to store the original data, for example: `data[["name.old"]] <- data[["name"]]`. Turns out this seems to be done by reference and old value was updated to new value! When I check the original data file I see the old value, so the update took place. Odd that the assignment was by reference... – Vince Apr 14 '21 at 02:00
  • 1
    For posterity, assigning the old data column as you showed me fixes the "by reference" issue as well :) `old.col <- paste0(ann$name, ".old"); data[, (old.col) := get(ann$name)]` – Vince Apr 14 '21 at 02:14