13

Consider a tibble where each column is a character vector which can take many values -- let's say "A" through "F".

library(tidyverse)
sample_df <- tibble(q1 = c("A", "B", "C"), q2 = c("B", "B", "A"))

I wish to create a function which takes a column name as an argument, and recodes that column so that any answer "A" becomes an NA and the df is otherwise returned as is. The reason for designing it this way is to fit into a broader pipeline that performs a series of operations using a given column.

There are many ways to do this. But I am interested in understanding what the best idiomatic tidy_eval/tidyverse approach would be. First, the question name needs to be on the left hand side of a mutate verb, so we use the !! and := operators appropriately. But then, what to put on the right hand side?

fix_question <- function(df, question) {
    df %>% mutate(!!question := recode(... something goes here...))
}

fix_question(sample_df, "q1") # should produce a tibble whose first column is (NA, "B", "C")

My initial thought was that this would work:

df %>% mutate(!!question := recode(!!question, "A" = NA_character_))

But of course the bang-bang on inside the function just returns the literal character string (e.g. "q1"). I ended up taking what feels like a hacky route to reference the data on the right hand side, using the base R [[ operator and relying on the . construct from dplyr, and it works, so in a sense I have solved my underlying problem:

df %>% mutate(!!question := recode(.[[question]], "A" = NA_character_))

I'm interested in getting feedback from people who are very good at tidyeval as to whether there is a more idiomatic way to do this, in hopes that seeing a worked example would enhance my understanding of the tidyeval function set more generally. Any thoughts?

TylerH
  • 20,799
  • 66
  • 75
  • 101
aaron
  • 315
  • 1
  • 7
  • Thanks, this is a clever approach -- I do use the functional approach in other parts in my code and could have thought about doing it here as well. I know some people frown on code style talk on SO, but seeing a few different styles of answer so quickly has been very fruitful for me. – aaron Oct 11 '19 at 18:00
  • 1
    Combining several ideas in this question, I believe this is the most succinct version that works with both `q1` (symbol) and `"q1"` (string): `df %>% mutate_at( vars(!!ensym(question)), recode, A = NA_character_)` – Artem Sokolov Oct 11 '19 at 18:56

3 Answers3

8

You can use the "curly curly" method now if you have rlang >= 0.4.0.

Explanation thanks to @eipi10:

This combines the two step process of quote-then-unquote into one step, so {{question}} is equivalent to !!enquo(question)

fix_question <- function(df, question){
  df %>% mutate({{question}} := recode({{question}}, A = NA_character_))
}

fix_question(sample_df, q1)
# # A tibble: 3 x 2
#   q1    q2   
#   <chr> <chr>
# 1 NA    B    
# 2 B     B    
# 3 C     A    

Note that unlike the ensym approach, this doesn't work with character names. Even worse, it does the wrong thing instead of just giving an error.

fix_question(sample_df, 'q1')

# # A tibble: 3 x 2
#   q1    q2   
#   <chr> <chr>
# 1 q1    B    
# 2 q1    B    
# 3 q1    A    
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
  • 2
    I haven't gotten into the "curly curly" habit yet. Do you know why this works, whereas the OP's seemingly-identical "bang bang" version didn't? – camille Oct 11 '19 at 17:52
  • Thanks for mentioning curly-curly, which I had heard was upcoming. The answer does not work for whatever version of rlang/dplyr I have installed; I get an error with the LHS. If I replace the LHS with my LHS and quote q1, I get the same problem I had above; if I don't quote q1, I get an error. This is possibly a version thing. – aaron Oct 11 '19 at 17:56
  • 1
    Yeah rlang 0.4.0 was just released at the end of June so if you haven't updated it since then this won't work for you – IceCreamToucan Oct 11 '19 at 17:58
  • 2
    I think the bang-bang didn't work because `question` first needs to be turned into a quosure (`question = enquo(question)`) before being used in the dplyr pipe. `{{question}}` is equivalent to `!!enquo(question)`. – eipi10 Oct 11 '19 at 17:58
  • Possibly -- but I did try `df %>% mutate(!!question := recode(!! enquo(question), A = NA_character_))` and got the same problem as `!!question`. Again, this might be because I'm on rlang 0.3.4. – aaron Oct 11 '19 at 18:03
  • 2
    You need enquo for the first instance of question too for that to be equivalent. – IceCreamToucan Oct 11 '19 at 18:05
7

You can make the function a bit more flexible by allowing a vector of recoded values to be entered as an argument as well. For example:

library(tidyverse)
sample_df <- tibble(q1 = c("A", "B", "C"), q2 = c("B", "B", "A"))

fix_question <- function(df, question, recode.vec) {

  df %>% mutate({{question}} := recode({{question}}, !!!recode.vec))

}

fix_question(sample_df, q1, c(A=NA_character_, B="Was B"))
  q1    q2   
1 <NA>  B    
2 Was B B    
3 C     A

Note that recode.vec is "unquote-spliced" with !!!. You can see what this is doing with this example, adapted from the Programming with dplyr vignette (search for "splice" to see the relevant examples). Note how !!! "splices" the pairs of recoding values into the recode function so that they are used as the ... argument in recode.

x = c("A", "B", "C")
args = c(A=NA_character_, B="Was B")

quo(recode(x, !!!args))

<quosure>
expr: ^recode(x, A = <chr: NA>, B = "Was B")
env:  global

If you want to potentially run the recoding function on multiple columns, you can turn it into a function that takes just a column name and a recoding vector. This approach seems like it would be more pipe-friendly.

fix_question <- function(question, recode.vec) {

  recode({{question}}, !!!recode.vec)

}

sample_df %>% 
  mutate_at(vars(matches("q")), list(~fix_question(., c(A=NA_character_, B="Was B"))))
  q1    q2   
1 <NA>  Was B
2 Was B Was B
3 C     <NA>

Or to recode a single column:

sample_df %>% 
  mutate(q1 = fix_question(q1, c(A=NA_character_, B="Was B")))
eipi10
  • 91,525
  • 24
  • 209
  • 285
6

Here, on the right side of :=, we can specify sym to convert to symbol and then evaluate (!!)

fix_question <- function(df, question) {
    df %>%
       mutate(!!question := recode(!! rlang::sym(question), "A" = NA_character_))
  }

fix_question(sample_df, "q1") 
# A tibble: 3 x 2
#  q1    q2   
#  <chr> <chr>
#1 <NA>  B    
#2 B     B    
#3 C     A    

A better approach that would work for both quoted and unquoted input is ensym

fix_question <- function(df, question) {
    question <- ensym(question)
    df %>%
       mutate(!!question := recode(!! question, "A" = NA_character_))
  }


fix_question(sample_df, q1)
# A tibble: 3 x 2
#  q1    q2   
#  <chr> <chr>
#1 <NA>  B    
#2 B     B    
#3 C     A    

fix_question(sample_df, "q1")
# A tibble: 3 x 2
#  q1    q2   
#  <chr> <chr>
#1 <NA>  B    
#2 B     B    
#3 C     A    
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 2
    I had tried to putz around with a few of the rlang conversion functions but obviously didn't choose the right one, but your approach works -- I think really I just need to workflow the type conversions in my head. My !!question doesn't work because it evaluates a character string literally. Yours works because it first converts the character string to a symbol, and then evaluates the symbol, returning the vector. I just couldn't wrap my head that that was the order of operations. Thanks again. – aaron Oct 11 '19 at 17:59