3

I have a data.table and a list of formulas,

DT <- data.table(A = c(1:3), B = c(3:1), C = c(4:6), D = (6:4))
l <- list(f1 = "A + B", f2 = "B + C", f3 = "C - D", f4 = "D / A")

This can be achieved by

DT[, ":="(f1 = A + B, f2 = B + C, f3 = C - D, f4 = D / A)]

or

for (i in 1:length(l)) {
  DT[, eval(names(l)[i]) := eval(parse(text=l[[i]]))]
}

Is there a way to do this using the information in l without using loop?

# some code
DT
#    A B C D f1 f2 f3       f4
# 1: 1 3 4 6  4  7 -2 6.000000
# 2: 2 2 5 5  4  7  0 2.500000
# 3: 3 1 6 4  4  7  2 1.333333
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
morningfin
  • 329
  • 2
  • 10
  • What is the problem with the loop here? Not all loops are bad. – dayne Jul 28 '16 at 19:17
  • I just would like to know if there is a way to avoid loop... – morningfin Jul 28 '16 at 19:18
  • You can use `lapply` to avoid loops over columns, generally, but it's not always a big deal to just use a loop as dayne said. First, I guess you should store these as expressions, not text: `L = lapply(l, function(x) parse(text=x))`. Then, something like `DT[, \`:=\`(names(L), lapply(L, eval, .SD))]`, which works, but I'm not sure is kosher. – Frank Jul 28 '16 at 19:25

2 Answers2

11

If you are constructing l by hand, instead write it like

L = quote(`:=`(f1 = A + B, f2 = B + C, f3 = C - D, f4 = D / A))

Then you can use it like

DT[, eval(L)]

#    A B C D f1 f2 f3       f4
# 1: 1 3 4 6  4  7 -2 6.000000
# 2: 2 2 5 5  4  7  0 2.500000
# 3: 3 1 6 4  4  7  2 1.333333

This is the recommended practice from the FAQ, which explains...

quote() and eval() are like macros in other languages.

Frank
  • 66,179
  • 8
  • 96
  • 180
  • I would like to eval all these formulas by group, but data.table does not allow me to do so. How should I do that? Thank you in advance! – morningfin Sep 20 '16 at 03:19
  • @Nal-rA You may need to post a new question to clarify. When I try it, it works for me: `DT[, id := c(1,1,2)]` then `DT[, eval(L), by=id][]` – Frank Sep 20 '16 at 04:42
  • 1
    Thanks! It seems that I am using the wrong way. Thank you again! – morningfin Sep 20 '16 at 04:56
  • @Frank what would be the recommended approach if we have vectors of operations stored in a list. Ex: `operations_list <- list(1 == "var1*var2*var3", 2 == "var1*var3*var8")`. The following code I tried doesn't seem to reference the variables named in the list elements: `L = quote(`:=`(prod= operations_list[[1]])); setDT(df)[, eval(L)]$prod; returns NAs introduced by coercion` after R attempts to coerce the RHS character vector to double to match the result (`prod`). – On_an_island Dec 24 '22 at 17:56
  • @On_an_island I'm not sure I understand, since as you noted, the list as written evaluates to FALSE from the start when it tests `==`. Maybe something like `ex = expression(list(1==var1*var2*var3, 2==var1*var3*var8)); DT[, eval(ex)]`? I think `expression` or `quote` should work – Frank Dec 27 '22 at 01:22
  • @Frank that was a poorly written comment on my part. I have a list with variable operations `ops_list=list(c("var1*var2*var3"), c("var1*var3*var8"))` that I need to use on a `DT` to see if it's faster than my current approach with `map`. I was attempting to sequence along each element of `ops_list` and using `:=` generate a new variable based on the operation in the respective list element, and store it as a vector, i.e., `DT[,"new_var" := ops_list[[1]] ]$new_var`, `DT[,"new_var" := ops_list[[2]] ]$new_var`, and so on. – On_an_island Dec 28 '22 at 22:31
  • 1
    @On_an_island I think your example still isn't fully what you want -- not sure why $newvar is appended. But working from your input, you can build the expression like `ex = ops_list %>% setNames(c("v1", "v2")) %>% lapply(str2lang) %>% c(as.name(":="), .) %>% as.call` (with magrittr syntax) -- Maybe you can post a new question if it's not clear? – Frank Dec 29 '22 at 18:13
  • @Frank. I posted a new question along with the `map` and `lapply` code that illustrates what I was trying to explain in my comment. https://stackoverflow.com/questions/75252746/optimize-a-large-number-of-variable-operations-and-variable-ordering – On_an_island Feb 01 '23 at 03:30
2

This is super sloppy, but you can create an expression using call, parse, and paste, then call that expression:

library(data.table)
DT <- data.table(A = c(1:3), B = c(3:1), C = c(4:6), D = (6:4))
l <- list(f1 = "A + B", f2 = "B + C", f3 = "C - D", f4 = "D / A")
ncall <- call(":=", names(l), 
          parse(text = paste0("list(", paste(l, collapse = ","), ")")))
DT[ , eval(ncall)]
DT
#    A B C D f1 f2 f3         f4
# 1: 1 3 4 6  4  7 -2 6.00000000
# 2: 2 2 5 5  4  7  0 2.50000000
# 3: 3 1 6 4  4  7  2 1.33333333
dayne
  • 7,504
  • 6
  • 38
  • 56
  • Can your method eval all the formulas by group? Thank you! @dayne – morningfin Sep 20 '16 at 05:16
  • You should be able to add a 'k' term and eval by group, e.g. `DT[ , eval(ncall), by = list(D)`. I have not explicitly tested this, and this example does not make sense for grouping, but you can certainly test some code out yourself. – dayne Sep 20 '16 at 10:47