Changing expression calculation inside a function changes output

Question

This question is related to this one.

Given a data.table and an expression, the function below creates a new column by reference for the expression result:

dat <- data.table(x = 1:4, y = 5:8)

new_column_1 <- function(df, col_name, expr) {
  col_name <- deparse(substitute(col_name))
  expr1 <- substitute(expr)
  df[, (col_name) := eval(expr1
                          ,df ,parent.frame()    # THIS LINE CAN BE DROPPED
                          )]
}

new_column_1 (dat, z, x + y)
dat
      x     y     z
   <int> <int> <int>
1:     1     5     6
2:     2     6     8
3:     3     7    10
4:     4     8    12

However, if we insert the intermediary code for expr1 directly inside eval, it does not work:

dat <- data.table(x = 1:4, y = 5:8)

new_column_2 <- function(df,col_name,expr){
  col_name <- deparse(substitute(col_name))
  df[, (col_name):=eval(substitute(expr)        # expr1 code inserted here
                      ,df ,parent.frame()       # CAN NOT DROP THIS LINE
  )]
}
new_column_2 (dat, z ,x + y)
dat

       x     y                                          z
   <int> <int>                                     <call>
1:     1     5 eval(substitute(expr), df, parent.frame())
2:     2     6 eval(substitute(expr), df, parent.frame())
3:     3     7 eval(substitute(expr), df, parent.frame())
4:     4     8 eval(substitute(expr), df, parent.frame())

What is going on? I would expect the same result.

OBS: Notice that in the first case we can drop eval's arguments, but not in the second case because it causes an error.

score 2 · Answer 1 · edited Mar 12 '22 at 09:54

this is because expr does not exit in the df environment, while expr() is actually an existing function, which you actually end up substituting. After replacing the "expr" argument name by "test", you will get a much more informative error message:

    new_column_2 <- function(df,col_name,test){
col_name <- deparse(substitute(col_name))
df[, (col_name):=eval(substitute(test),df ,parent.frame())]
}
new_column_2 (dat, z ,x + y)

Error in eval(substitute(test), df, parent.frame()) : object 'test' not found

To solve this issue, you might want to have a look here eval and quote in data.table. Also, I don't know what you are trying to achieve excatly, but it seems like you are re-inventing the solutions that data.table implemented for you.

Great link with plenty of valuable information, thank you. My only purpose is to deeply understand the mechanics of eval and scope of arguments when we use data.tables inside functions. — Fabio Correa, Mar 12 '22 at 14:30

score 1 · Answer 2 · answered Mar 12 '22 at 09:58

1

You could use rlang::enexpr to capture the expression input to the function:

new_column_2 <- function(df,col_name,expr){
  expr <- rlang::enexpr(expr)
  col_name <- deparse(substitute(col_name))
  df[, (col_name):=eval(expr)]
}
new_column_2 (dat, z ,x + y)

dat
       x     y     z
   <int> <int> <int>
1:     1     5     6
2:     2     6     8
3:     3     7    10
4:     4     8    12

answered Mar 12 '22 at 09:58

Waldi

39,242
6
30
78

`enexpr` works perfect. Actually, in `new_column_2` you can skip the line `expr <- rlang::enexpr(expr)`, and insert it directly in 'df[, (col_name) := eval(rlang::enexpr(expr) )]`. Thank you. – Fabio Correa Mar 12 '22 at 14:49

Fabio Correa · Answer 3 · 2022-03-12T15:00:31.383

0

As per @Waldi suggestion, rlang::enexpr fixes the problem in new_colum_2:

new_column_2_FIXED <- function(df,col_name,expr){
  col_name <- deparse(substitute(col_name))
  df[, (col_name):=eval(rlang::enexpr(expr)   
  )]
}
new_column_2_FIXED (dat, z ,x + y)
dat

More interesting knowledge about substitute vs enexpr can be found here.

edited Mar 12 '22 at 15:00

answered Mar 12 '22 at 14:54

Fabio Correa

1,257
1
11
17

Changing expression calculation inside a function changes output

3 Answers3