1

I am trying to use the name of the variables in .SD but I can't manage to get it. In the toy example below, I need to concatenate the suffix " by {z}" to any cell in the table that has an "a". The {z} part stands for the name of the variable, and I need to do it for all variables. See below the input table and the desired output table.

library(data.table)
# Input 
ip <- data.table(x = c("ab", "cd", "ac", "de"),
                 y = c("fr", "ad", "fa", "we"))

ip[]
#>     x  y
#> 1: ab fr
#> 2: cd ad
#> 3: ac fa
#> 4: de we

# Desired Output table

op <- data.table(x = c("ab b x", "cd", "ac by x", "de"),
                 y = c("fr", "ad by y", "fa by y", "we"))
op[]
#>          x       y
#> 1:  ab b x      fr
#> 2:      cd ad by y
#> 3: ac by x fa by y
#> 4:      de      we

One way that I thought could work is to use deparse(substitute(x)) as in the example below.

add_if_pattern <- function(x, pattern) {
  y <- deparse(substitute(x))
  fifelse(test = grepl(pattern, x),
          paste(x, "by",  y),
          x)
}

pattern <- "a"
z <- "blah"
q <- "bleh"
add_if_pattern(z, pattern) ## add the pattern
#> [1] "blah by z"
add_if_pattern(q, pattern) ## does not add the pattern
#> [1] "bleh"

However, when I include that function into a lapply(.SD) in data.table it does something unexpected.

tp <- copy(ip)
ip <- copy(tp)

vars <- names(ip)
ip[, (vars) := lapply(.SD,add_if_pattern, pattern)]
ip[]
#>               x            y
#> 1: ab by X[[i]]           fr
#> 2:           cd ad by X[[i]]
#> 3: ac by X[[i]] fa by X[[i]]
#> 4:           de           we

I don't need X[[i]], but the names of the original variables, either x or y. I also tried using names(.SD) but it seems that it is outside of the scope and thus got an error (see below). Could you please give a hand?

Thanks.

ip <- copy(tp)
ip[, (vars) := lapply(.SD,
                      \(x){
                        fifelse(test = grepl("classified", x),
                                paste(x, "by",  names(.SD)[..x]),
                                x)
                      })]
#> Error in `[.data.table`(ip, , `:=`((vars), lapply(.SD, function(x) {: Variable 'x' is not found in calling scope. Looking in calling scope because this symbol was prefixed with .. in the j= parameter.

Created on 2022-08-30 with reprex v2.0.2

1 Answers1

1

Consider passing an argument for column name and then use Map

add_if_pattern <- function(x, pattern, colnm) {
   y <- colnm
   fifelse(test = grepl(pattern, x),
           paste(x, "by",  y),
           x)
 }

-testing

ip[, (vars) := Map(function(x, nm)
   add_if_pattern(x, pattern, nm), .SD, names(.SD)), .SDcols = vars] 

-output

> ip
         x       y
    <char>  <char>
1: ab by x      fr
2:      cd ad by y
3: ac by x fa by y
4:      de      we
akrun
  • 874,273
  • 37
  • 540
  • 662
  • That is fantastic! thank you so much. Forgive my ignorance, but `Map` seems to me old R programming. Is this the only way to do it? Is it because we are mapping over two variables? – R.Andres Castaneda Aug 30 '22 at 18:51
  • @R.AndresCastaneda you are using `substitute/deparse` to extract names. which works on a single column, but in a loop, it is not that direct as you may have to use parrent.frame etc to return the name, which is cumbersome. With this way, it is more direct and can work on single or multiple columns – akrun Aug 30 '22 at 18:54
  • 1
    @R.AndresCastaneda `Map` is a standard programming option. You could still derive the names as showed [here](https://stackoverflow.com/questions/18508790/deparsesubstitutex-in-lapply), but i wouldn't use that – akrun Aug 30 '22 at 18:56
  • The anonymous function could be avoided by making the arguments explicit, right? `ip[, (vars) := Map(add_if_pattern, x = .SD, colnm = names(.SD), pattern = pattern)]` – R.Andres Castaneda Aug 31 '22 at 02:43
  • @R.AndresCastaneda yes, you are right – akrun Aug 31 '22 at 15:02