refer to quoted column name in a function in R

Question

I want to use the na_omit function from the collapse package in a user-defined function. na_omit requires a column name to be in quotes as one of its arguments. If I didn't need the column name in quotes, I could just refer to the column name in double braces, {{col}}, as mentioned in this vignette, "Programming with dplyr". If I refer to the column using the glue package, such as glue::glue("{col}"), I receive errors.

Here is a reprex:

my_df <-
  data.frame(
    matrix(
      c(
        "V9G","Blue",
        NA,"Red",
        "J4C","White",
        NA,"Brown",
        "F7B","Orange",
        "G3V","Green"
      ),
      nrow = 6,
      ncol = 2,
      byrow = TRUE,
      dimnames = list(NULL,
                      c("color_code", "color"))
    ),
    stringsAsFactors = FALSE
  )

library(collapse)
library(dplyr)
library(glue)

my_func <- function(df, col){
  df %>% 
    collapse::na_omit(cols = c(glue("{col}"))) #Here is the code that fails
}

my_func(my_df, color_code)

The expected output can be generated with the following:

my_df %>% 
  collapse::na_omit(cols = c("color_code"))

and should produce:

#  color_code  color
#1        V9G   Blue
#2        J4C  White
#3        F7B Orange
#4        G3V  Green

How should I refer to a quoted column name that's a parameter and an argument of a function within a user-defined function in R?

have you read https://dplyr.tidyverse.org/articles/programming.html? — r2evans, Feb 02 '22 at 21:47
try `function(df, col) { col <- as.character(substitute(col)); ...; }` — r2evans, Feb 02 '22 at 21:50
What error do you get? When I try to copy your code, I just get an `Error: object 'color_code' not found ` error, which goes away when I pass in `"color_code"` as a string and returns what I think is the expected output — divibisan, Feb 02 '22 at 21:56
Sebastian's solution is exactly what I was looking for. In particular, `as.character(substitute())`. I didn't see this in the dplyr vignette. I'll suggest an addition. — Ted M., Feb 03 '22 at 14:05
Suggesting a change to the *dplyr* vignette won't be appropriate in this case, as I now recognize. [See my comment to Sebastian's solution about what needs to be done for an unquoted column name.](https://stackoverflow.com/questions/70963235/refer-to-quoted-column-name-in-a-function-in-r#comment125471630_70964120) — Ted M., Feb 03 '22 at 18:25

score 2 · Accepted Answer · edited Feb 06 '22 at 13:17

In general, collapse is mostly standard evaluation and its NSE features are based upon base R, so most of the rlang, glue stuff, {{ }}, etc. won't work, but you will have simpler and faster code. For base R NSE functional programming, see http://adv-r.had.co.nz/Computing-on-the-language.html.

As suggested by r2evans, for a single column, a solution would be:

my_func <- function(df, col) { 
  col_char_ref <- as.character(substitute(col))
  df %>% 
    collapse::na_omit(cols = col_char_ref)
}

i.e. use substitute() to capture the expression and as.character or all.vars to extract the variables. For multiple columns a general solution is wrapping fselect e.g.

library(collapse)
my_func <- function(df, ...) {
  cols <- fselect(df, ..., return = "indices")
  na_omit(df, cols = cols) 
}

my_func(wlddev, PCGDP:GINI, POP) |> head()
#>   country iso3c       date year decade                region
#> 1 Albania   ALB 1997-01-01 1996   1990 Europe & Central Asia
#> 2 Albania   ALB 2003-01-01 2002   2000 Europe & Central Asia
#> 3 Albania   ALB 2006-01-01 2005   2000 Europe & Central Asia
#> 4 Albania   ALB 2009-01-01 2008   2000 Europe & Central Asia
#> 5 Albania   ALB 2013-01-01 2012   2010 Europe & Central Asia
#> 6 Albania   ALB 2015-01-01 2014   2010 Europe & Central Asia
#>                income  OECD    PCGDP LIFEEX GINI       ODA     POP
#> 1 Upper middle income FALSE 1869.866 72.495 27.0 294089996 3168033
#> 2 Upper middle income FALSE 2572.721 74.579 31.7 453309998 3051010
#> 3 Upper middle income FALSE 3062.674 75.228 30.6 354950012 3011487
#> 4 Upper middle income FALSE 3775.581 75.912 30.0 338510010 2947314
#> 5 Upper middle income FALSE 4276.608 77.252 29.0 335769989 2900401
#> 6 Upper middle income FALSE 4413.297 77.813 34.6 260779999 2889104

^{Created on 2022-02-03 by the reprex package (v2.0.1)}

I could alternatively use `deparse(substitute())` to get the same desired output. Is this preferred? — Ted M., Feb 03 '22 at 17:39
I've clearly been living in a *dplyr* world for most of my R life. Even to achieve output like `my_df %>% collapse::ftransform(count = stringr::str_length(color))` in a user-defined function, I need to use something like the following, which seems quite verbose:`my_func <- function(df, col){env <- list2env(df, parent = parent.frame()); col <- substitute(col); df %>% collapse::ftransform(count = stringr::str_length(eval(col, env))) }; my_func(my_df, color)` — Ted M., Feb 03 '22 at 18:15

score 0 · Answer 2 · answered Feb 02 '22 at 22:13

0

You have to provide col name as a character, like:

my_func <- function(df, col){
  df %>% 
    collapse::na_omit(cols = c(glue("{col}"))) #Here is the code that fails
}

my_func(my_df, col = "color_code")

answered Feb 02 '22 at 22:13

Grzegorz Sapijaszko

1,913
1
5
12

Ted M. · Answer 3 · 2022-02-08T19:04:06.750

It's important to first determine what environment in R you're programming in. Are you in dplyr or base R? If in dplyr, then reference the documentation for programming with dplyr, rlang, glue, and this stackoverflow answer. If in base R, reference the documentation on non-standard evaluation, especially wrapping quoted columns in as.character(substitute()) and wrapping functions with unquoted columns in eval(substitute()).

It should be noted that both of the approaches above involve non-standard evaluation. Another approach is use standard evaluation (or some "combination" of standard evaluation and non-standard evaluation). For example, see the issue raised in this link.

Reasons for this question come, at least partially, from environment confusion. Here are some of the different approaches in a reprex.

Data

my_df <-
  data.frame(
    matrix(
      c(
        "V9G","Blue",
        NA,"Red",
        "J4C","White",
        NA,"Brown",
        "F7B","Orange",
        "G3V","Green"
      ),
      nrow = 6,
      ncol = 2,
      byrow = TRUE,
      dimnames = list(NULL,
                      c("color_code", "color"))
    ),
    stringsAsFactors = FALSE
  )

Packages

library(collapse)
library(dplyr)
library(stringr)
library(glue)

Functional Programming in base R (non-standard evaluation)
with a quoted column name:

my_func <- function(df, col) {
  col_char_ref <- as.character(substitute(col)) #Use as.character(substitute()) to refer to a quoted column name
  df %>% 
    collapse::na_omit(cols = col_char_ref) 
}

my_func(my_df, color_code)

#Should generate output below
my_df %>% 
  collapse::na_omit(cols = "color_code")

and with a non-quoted column name:

my_func <- my_func <- function(df, col){
  df <- df # This makes sure "df" is available inside the function environment where we evaluate the ftransform expression
  eval(substitute(collapse::ftransform(df, count = stringr::str_length(col)))) # Wrap the function to be evaluated in eval(substitute())
}

 my_func(my_df, color)

 #Should generate output below
 my_df %>%  
  collapse::ftransform(count = stringr::str_length(color))

Functional programming in dplyr (non-standard evaluation)
with a quoted column name using glue and dplyr functions:

my_func <- function(df, col1, col2) {
  df %>%
    mutate(description := glue("color code: {pull(., {{col1}})}; color: {pull(., {{col2}})}"))
}

my_func(my_df, color_code, color)

#Should generate output below
my_df %>%
  mutate(description = glue("color code: {color_code}; color: {color}"))

or with a quoted column name using a C language wrapper function:

my_func <- function(df, col1, col2) {
  df %>%
    mutate(description := sprintf("color code: %s; color: %s", {{col1}}, {{col2}}))
}

my_func(my_df, color_code, color)

#Should generate output below
my_df %>%
  mutate(description = glue("color code: {color_code}; color: {color}"))

and with a non-quoted column name:

my_func <- function(df, col){
  df %>%  
    dplyr::mutate(count = stringr::str_length({{ col }}))
}

my_func(my_df, color)

#Should generate output below
my_df %>% 
  dplyr::mutate(count = stringr::str_length(color))

Correcting error-producing code
The following code that produces an error provides a motivation for the two examples below:

my_func <- function(df, col){
  df <- df
  df %>%  
    collapse::na_omit(cols = as.character(substitute(col))) %>% 
    eval(substitute(collapse::ftransform(description = stringr::str_length(col))))
}

my_func(my_df, color_code)

#Error in ckmatch(cols, nam) : Unknown columns: col

The examples below are alternatives that do not produce errors.

Functional Programming in base R (standard evaluation - requires column to be passed as character string in function)

library(pkgcond)

my_func <- function(df, col) {
  if (!is.character(substitute(col)))
    pkgcond::pkg_error("col must be a quoted string") #if users aren't used to quoted strings as inputs to a function
  df <- na_omit(df, cols = col) 
  df$count <- stringr::str_length(.subset2(df, col))
  df
}

my_func(my_df, "color_code")

#Should generate output below
my_df %>% 
  na_omit(cols = "color_code") %>% 
  ftransform(description = stringr::str_length("color_code"))

Functional Programming in base R ("combination" of standard evaluation and non-standard evaluation)

my_func <- function(df, col){
  df <- df
  df <- collapse::na_omit(df, cols = as.character(substitute(col))) # Unlike the code with the error, the function is not piped (using %>%)
  eval(substitute(collapse::ftransform(df, description = stringr::str_length(col))))
}

 my_func(my_df, color_code)

 #Should generate output below
 my_df %>% 
  na_omit(cols = "color_code") %>% 
  ftransform(description = stringr::str_length("color_code"))

More complex examples using the collapse package can be referenced at this link.

refer to quoted column name in a function in R

3 Answers3