2

I'm trying to write a function that takes a dataframe, converts a column from chr to dbl, then adds 1 to a column. I also want to optionally replace certain values with NA. Otherwise, if the relevant argument is not used, I want the function to skip the NA replacement step.

Data

library(tibble)
library(dplyr)
library(magrittr)

df <-
  tibble(id = 1:10, col_of_interest = 21:30) %>%
  add_row(id = 11, col_of_interest = 999) %>%
  mutate(across(col_of_interest, as.character))

df

## # A tibble: 11 x 2
##       id col_of_interest
##    <dbl> <chr>          
##  1     1 21             
##  2     2 22             
##  3     3 23             
##  4     4 24             
##  5     5 25             
##  6     6 26             
##  7     7 27             
##  8     8 28             
##  9     9 29             
## 10    10 30             
## 11    11 999  

Writing a function

The function should:

  1. Take in the data.
  2. Convert col_of_interest from chr to dbl.
  3. Replace 999 with NA (but only if I specified that 999 should be replaced with NA)
  4. Add 1 to col_of_interest

My attempt

When writing my function I was guided by two resources:

  1. Passing data variables into function arguments using {{ var }} as covered here.
  2. The use of if is based on this answer.
add_one <- function(data, var, na_if_val = NULL) {

  data %>%

    mutate(across({{ var  }}, as.numeric)) %>%
    
    {if( is.null( {{ na_if_val }} )
    ) .  # <--- the dot means: "return the preexisting dataframe"

      else

        na_if( {{ na_if_val }} )

    } %>%
    
    mutate(across({{ var  }}, add, 1))
}

When I test the function on my df object I get an error.

add_one(data = df,
        var = col_of_interest,
        na_if_val = "999")

Error in check_length(y, x, fmt_args("y"), glue("same as {fmt_args(~x)}")) : argument "y" is missing, with no default

Googling this error yielded this page, stating that:

Note, however, that na_if() can only take arguments of length one.

However, incorporating only na_if( {{ na_if_val }} ) in add_one function's pipe does work. It's the conditional evaluation combined with is.null that causes the function to break. I don't understand why.

Emman
  • 3,695
  • 2
  • 20
  • 44

2 Answers2

1

Your have several problems, but the main one is because you are doing the non-stardard evaluation wrong.

add_one <- function(data, var, na_if_val = NULL) {
  
  var_b <- enquo(var)
  
  data <- data %>%
    mutate(across(!!var_b, as.numeric)) 
  
   if(!is.null(na_if_val)){
     data <- data %>% 
       mutate(across(!!var_b, na_if, y = na_if_val))
   }
   
  data <- data %>% 
    mutate(across(!!var_b, add, 1))
  
  return(data)
}

Returning this:

add_one(df, col_of_interest, 999)

# A tibble: 11 x 2
      id col_of_interest
   <dbl>           <dbl>
 1     1              22
 2     2              23
 3     3              24
 4     4              25
 5     5              26
 6     6              27
 7     7              28
 8     8              29
 9     9              30
10    10              31
11    11              NA

First, you need to enquote the variable of interest with the enquo() function, then, you unquote this variable (with bang bang !!) in the places that you want it. Another problem of your function, is inserting your if statement, in the middle of a pipe, this does not work. If you need to apply certain methods in special cases, you need to evaluate it separately from the main calculation.

Pedro Faria
  • 707
  • 3
  • 7
  • 1
    Thanks! When writing the function I followed this page: (https://dplyr.tidyverse.org/articles/programming.html). "When you have the data-variable in a function argument (i.e. an env-variable that holds a promise2), you need to embrace the argument by surrounding it in doubled braces, like filter(df, {{ var }})." Would you mind explaining why this isn't the case here? – Emman Oct 01 '20 at 11:39
  • 1
    Hey @Emman, I not sure about this, but I think the curly braces `{}` are used to enquote expressions, not variable names per se. This does not cause any problem, if you try to substitute my code `!!var_b` with `{{ var }}`, you see that it works as well. Is just that using `enquo()` and `!!`, are more standard ways to do it. Copy it? – Pedro Faria Oct 01 '20 at 11:47
  • For sure. But I now understand that my problem also has to do with using newer methods of tidy evaluation. Option A: I came across a limitation of tidy evaluation that can't work in my case; which then means I should use more traditional and trusted methods like yours. Option B: I'm not using tidy evaluation properly. – Emman Oct 01 '20 at 11:57
  • Is not a matter of newer or older, but what is the wright application for your case. The `enquo()` and `!!` are commonly used to select variable names from arguments of functions. But this method does not work, for example, if you want to select in `across()` all the columns that are numeric. In this case, you need to use the curly braces, by passing the expression `where(is.numeric)` to the argument `var`, in a line of a function like this: `mutate(across({{ var }}))` – Pedro Faria Oct 01 '20 at 12:06
  • I think I fixed it using my initial code. I simply specified `x` and `y` arguments of `drop_na` explicitly in the `else` part: `na_if(x = ., y = {{ val_na_if }} )`. And it runs without using `!!` or `enquo()` at all. So it does run, but would you still argue that it's not a correct way to write the function.? – Emman Oct 01 '20 at 13:14
  • Absolutely not @Emman. The most important thing is to get your work done in a segure way, if you fixed your problem, great! You made it! I just sad, that I think curly braces are more appropriate to evaluate expressions (or a group of functions as an argument of your function), and `enquo()` & `!!` are more used for evaluate variable names. So is just a matter of standard. By adopting the right standard, your work becomes more fluid, because you less likely falls in errors. – Pedro Faria Oct 01 '20 at 13:23
1

I solved the problem by simply specifying x and y arguments of drop_na.

add_one <- function(data, var, na_if_val = NULL) {

  data %>%

    mutate(across({{ var  }}, as.numeric)) %>%
    
    {if( is.null( {{ na_if_val }} )
    ) .  # <--- the dot means: "return the preexisting dataframe"

      else

        na_if(x = ., y = {{ na_if_val }} ) ## <-- change is here

    } %>%
    
    mutate(across({{ var  }}, add, 1))
}


add_one(data = df,
        var = col_of_interest,
        na_if_val = 999)

## # A tibble: 11 x 2
##       id col_of_interest
##    <dbl>           <dbl>
##  1     1              22
##  2     2              23
##  3     3              24
##  4     4              25
##  5     5              26
##  6     6              27
##  7     7              28
##  8     8              29
##  9     9              30
## 10    10              31
## 11    11              NA

EDIT

I removed {{ }} around na_if_val following @LionelHenry's comment.

add_one <- function(data, var, na_if_val = NULL) {

  data %>%

    mutate(across({{ var  }}, as.numeric)) %>%

    {if( is.null(na_if_val)
    ) .  # <--- the dot means: "return the preexisting dataframe"

      else

        na_if(x = ., y = na_if_val)

    } %>%

    mutate(across({{ var  }}, add, 1))
}
Emman
  • 3,695
  • 2
  • 20
  • 44
  • 1
    Hi @Emman. You should only use `!!` and `{{` with data-masking functions. Here you're using `{{` with `is.null()` and `na_if()` which are normal functions. This is not doing anything. It works because `na_if_val` is also a normal argument. I recommend just removing `{{` around the `na_if_val` references, this way you function will be simpler and less confusing. – Lionel Henry Oct 01 '20 at 13:58