2

This question is realted to this one.

I need a parametrized function to implement the following code:

dt <- data.table(  text = c("AAA 123 BBB", "1 CCC")
                 , text2 = c("AAA 123 BBB", "1 CCC"))

double_number <- function(x) x*2
regex_identify_num <- "\\d+" 

dt[, calc_value := text |> str_extract(regex_identify_num) |> as.numeric ()|> double_number()]
dt[, text := mapply(function(x,y) gsub(regex_identify_num, x, y, perl = T), calc_value, text)]
dt[, calc_value:=NULL]

The expected result is to get the numbers doubled in the selected column, as per the code above.
I tried:

change_value <- function(dt_, fun, regex, column_name) {
  #browser()
  dt_[, new_column := get(column_name) |> str_extract(get(regex)) |> fun()]
  dt_[, column_name := mapply(function(x, y) gsub(get(regex), x, y, perl = T), new_column, get(column_name))]
  dt_[, new_column := NULL]
  return(dt)
}

change_value(dt, function(x) x |> as.numeric ()|> double_number(), "regex_identify_num", "text") 

But I get Error in get(column_name) : first argument has length > 1. Interestingly, when debugging the evaluation of get(column_name) |> str_extract(get(regex)) |> fun() seems correct (246 2).

How do I fix this?

PS> I also tried a different approach (programming through data.table),no success:

change_value <- function(dt_, fun, regex, column_name) {
  #browser()
  dt_[, new_column:= column_name|> str_extract(regex) |> fun()
      , env = list(  fun = substitute(fun)
                   , column_name = substitute(column_name)
                   , regex = substitute(regex)
                   )
      ]
  dt_[, column_name := mapply(function(x, y) gsub(regex, x, y, perl = T), new_column, column_name)]
  dt_[, new_column := NULL]
  return(dt)
}
Fabio Correa
  • 1,257
  • 1
  • 11
  • 17

2 Answers2

2

The error does not occur on the first pass, only on the second pass (and subsequent),

dt
#           text       text2
#         <char>      <char>
# 1: AAA 123 BBB AAA 123 BBB
# 2:       1 CCC       1 CCC
change_value(dt, function(x) x |>
  as.numeric ()|>
  double_number(), "regex_identify_num", "text")[]
#           text       text2 column_name
#         <char>      <char>      <char>
# 1: AAA 123 BBB AAA 123 BBB AAA 246 BBB
# 2:       1 CCC       1 CCC       2 CCC
dt
#           text       text2 column_name
#         <char>      <char>      <char>
# 1: AAA 123 BBB AAA 123 BBB AAA 246 BBB
# 2:       1 CCC       1 CCC       2 CCC
change_value(dt, function(x) x |>
  as.numeric ()|>
  double_number(), "regex_identify_num", "text")[]
# Error in get(column_name) : first argument has length > 1

Part of this is that, due to data.table referential semantics, your dt is modified in-place. Perhaps this is by design. But when you do so, the non-standard evaluation behaves a little differently: while you have get(column_name) inside of a data.table namespace, column_name both exists as a vector in the function and as a column in the frame, the third column above.

We can work around this by using data.table's ..varname semantics.

change_value <- function(dt_, fun, regex, column_name) {
  #browser()
  dt_[, new_column := get(..column_name) |> str_extract(get(..regex)) |> fun()]
  dt_[, column_name := mapply(function(x, y) gsub(get(..regex), x, y, perl = T), new_column, get(..column_name))]
  dt_[, new_column := NULL]
  return(dt)
}
change_value(dt, function(x) x |>
  as.numeric ()|>
  double_number(), "regex_identify_num", "text")[]
#           text       text2 column_name
#         <char>      <char>      <char>
# 1: AAA 123 BBB AAA 123 BBB AAA 246 BBB
# 2:       1 CCC       1 CCC       2 CCC
change_value(dt, function(x) x |>
  as.numeric ()|>
  double_number(), "regex_identify_num", "text")[]
#           text       text2 column_name
#         <char>      <char>      <char>
# 1: AAA 123 BBB AAA 123 BBB AAA 246 BBB
# 2:       1 CCC       1 CCC       2 CCC

(repeated runs do not err).

Another option would be to name your argument and your new-column differently.

Factoid: the ..-referencing works in the j= part of data.table::[, but not in i=.

r2evans
  • 141,215
  • 6
  • 77
  • 149
1

Building upon r2evans observation, the key point is to enclose column_name in parentesis when atributing the result back to it:

dt_[, (column_name) := mapply(function(x, y) gsub(get(regex), x, y, perl = T), new_column, get(column_name))]

By doing this we avoid name clashes, and can skip the .. .

Then the full code becomes:

change_value <- function(dt_, fun, regex, column_name) {
  #browser()
  dt_[, new_column := get(column_name) |> str_extract(get(regex)) |> fun()]
  dt_[, (column_name) := mapply(function(x, y) gsub(get(regex), x, y, perl = T), new_column, get(column_name))]
  dt_[, new_column := NULL]
  return(dt)
}
Fabio Correa
  • 1,257
  • 1
  • 11
  • 17