Use dynamic variable names in dplyr case_when()

Question

I have a function in R which uses case_when:

myfunction <- function(df, col, case_name, cntl_name) {

object <- df %>%
    mutate(
        class = case_when(
            col == case_name ~ 1,
            col == cntl_name ~ 0,
         )
     )
return(object)
}

So if I have this object:

df <- structure(list(id = c("ID1", "ID2", 
"ID3", "ID4", "ID5"
), phenotype = c("blue", "blue", "red", 
"green", "red"), treatment = c("treat1", "treat2", 
"none", "none", "none"), weeks_of_treatment = c(0, 0, 0, 0, 0
)), row.names = c("ID1", "ID2", 
"ID3", "ID4", "ID5"
), class = "data.frame")

> df
     id phenotype treatment weeks_of_treatment
ID1 ID1      blue    treat1                  0
ID2 ID2      blue    treat2                  0
ID3 ID3       red      none                  0
ID4 ID4     green      none                  0
ID5 ID5       red      none                  0

And run:

newdf <- myfunction(df, "phenotype", "red", "blue")

It should return a dataframe that looks like this:

   id phenotype treatment weeks_of_treatment class
1 ID1      blue    treat1                  0     0
2 ID2      blue    treat2                  0     0
3 ID3       red      none                  0     1
4 ID4     green      none                  0    NA
5 ID5       red      none                  0     1

But it doesn't - it returns this:

> newdf
   id phenotype treatment weeks_of_treatment class
1 ID1      blue    treat1                  0    NA
2 ID2      blue    treat2                  0    NA
3 ID3       red      none                  0    NA
4 ID4     green      none                  0    NA
5 ID5       red      none                  0    NA

It does not recognise the variable as col as the column phenotype. Does anyone know how to input a dynamic variable into case_when?

I have tried other solutions for variables in dplyr (eg, using double brackets around col [[col]]) but I can't find something that works.

I think you were closer to a solution with `[[col]]`: `{{col}}` might have worked. The issue is, I suspect, tidyverse's use of non-standard evaluation, or [NSE](https://dplyr.tidyverse.org/articles/programming.html). — Limey, Aug 04 '20 at 11:32

score 2 · Accepted Answer · answered Aug 04 '20 at 11:39

myfunction <- function(df, col, case_name, cntl_name) {
  object <- df %>%
    mutate(
      class = case_when(
        {{col}} == case_name ~ 1,
        {{col}} == cntl_name ~ 0,
      )
    )
  return(object)
}

myfunction(df, phenotype, "red", "blue")
   id phenotype treatment weeks_of_treatment class
1 ID1      blue    treat1                  0     0
2 ID2      blue    treat2                  0     0
3 ID3       red      none                  0     1
4 ID4     green      none                  0    NA
5 ID5       red      none                  0     1

Personally, I prefer

myfunction <- function(df, col, case_name, cntl_name) {
  qCol <- enquo(col)
  object <- df %>%
    mutate(
      class = case_when(
        !! qCol == case_name ~ 1,
        !! qCol == cntl_name ~ 0,
      )
    )
  return(object)
}

because it makes the separation between environment variables and data frame variables explicit.

The link in my comment is my go-to page when working with NSE.

Use dynamic variable names in dplyr case_when()

1 Answers1