1

I'd like to (dynamically and in place) bring some columns from one data table into another. I have the following code which gives the desired output:

# Set up example data
dt <- as.data.table(iris)[, rowid:=.I]
dt1 <- dt[, .(rowid, Sepal.Length, Sepal.Width)]
dt2 <- dt[, .(rowid, Species)]
cols <- c('Sepal.Length', 'Sepal.Width')

# Join the datasets
dt2[dt1, (cols):= mget(cols), on='rowid', verbose=T]

However, this is slow on my real data, and the output from verbose states: "'(m)get' found in j. ansvars being set to all columns. Use .SDcols or a single j=eval(macro) instead. Both will detect the columns used which is important for efficiency."

I've tried various forms of .SDcols and eval, but they all either give errors or give inappropriate output (e.g. columns contain the column name as a string). e.g.:

dt2[dt1, (cols):= .SD, .SDcols=cols, on='rowid', verbose=T]
dt2[dt1, (cols):= eval(quote("list(Sepal.Length, Sepal.Width)")), on='rowid', verbose=T]
dt2[dt1, (cols):= eval(parse(text="list(Sepal.Length, Sepal.Width)")), on='rowid', verbose=T]

NB. This question is similar to .SD and .SDcols for the i expression in data.table join , however the accepted answer in that case is to use mget, and I'm trying to find an alternative to that which is more efficient.

MattB
  • 651
  • 3
  • 11
  • Here is an option using .SD: `dt2[, (cols) := dt1[dt2, .SD, .SDcols = cols, on='rowid'], verbose=T]` but I am unsure about efficiency. – s_baldur Jan 13 '21 at 13:06
  • In this example, your `dt1` has *just* the columns you need (`cols` columns plus the `"rowid"`), so I can't help but think that this is over-engineering something that can be resolved with `dt2[dt1, on = .(rowid)]`. If there are other columns present (that you do not want to merge in), then `dt2[dt1[,c("rowid", cols), with=FALSE], on = .(rowid)]` seems to work well, too. – r2evans Jan 13 '21 at 14:56
  • `quote("list(Sepal.Length, Sepal.Width)")` is incorrect use of `quote`, you need to remove inner `"` – jangorecki Jan 14 '21 at 13:52
  • Does this answer your question? [How can one work fully generically in data.table in R with column names in variables](https://stackoverflow.com/questions/24833247/how-can-one-work-fully-generically-in-data-table-in-r-with-column-names-in-varia) – jangorecki Jan 14 '21 at 13:54
  • Also potentially helpful: [Assigning/Referencing a column name in data.table dynamically in i and j](https://stackoverflow.com/questions/60818053/assigning-referencing-a-column-name-in-data-table-dynamically-in-i-j-and-by/60818152#60818152) – Ian Campbell Jan 14 '21 at 13:59
  • Thank you for the comments (and yes, my real data has many more columns I don't want to include). The suggested other questions are both useful (in fact I'd already been through them both before asking) but neither give me something that I can use in this example (as far as I can tell). – MattB Jan 14 '21 at 14:27
  • @jangorecki Can you give an example on how to use `quote`, but use column names dynamically derived from the `cols` variable? I've tried many variations of this but can't figure out a way of doing it without typing them all out. – MattB Jan 14 '21 at 14:28
  • `quote` does not let you to make input dynamic, use `substitute` instead. Alternatively `as.call(lapply(c("list", cols), as.name))`. – jangorecki Jan 14 '21 at 15:02
  • Thank you for helping. I've tried variations of `dt2[dt1, (cols):= eval(as.call(lapply(c("list", cols), as.name))), on='rowid', verbose=T]` but just get the error: `object 'Sepal.Length' not found`. – MattB Jan 14 '21 at 16:45

0 Answers0