0

I create a series of dynamically named variables within a function and then want to create new variables based the ones previously created. Specifically, I'm trying to write a function to remove outliers from a series of variables in a data frame. Here is the code so far, with the last line testing it using the mtcars data frame:

# create function
remove_outliers <- function(data, vars, group_vars) {
  
  data %>% 
    
    group_by( across( {{ group_vars }} ) ) %>% 
    
    mutate(across({{vars}}, list(quartile3 = ~ quantile(.x, 0.75, na.rm = TRUE),
                                 quartile1 = ~ quantile(.x, 0.25, na.rm = TRUE),
                                 iqr = ~ IQR(.x, na.rm = TRUE),
                                 iqr_outlier_1 = ~ ( quantile(.x, 0.25, na.rm = TRUE) - (1.5 * IQR(.x, na.rm = TRUE)) ),
                                 iqr_outlier_3 = ~ ( quantile(.x, 0.75, na.rm = TRUE) + (1.5 * IQR(.x, na.rm = TRUE)) ) ))) %>% 
    
    
    
    ungroup() %>% 
    
    mutate(across({{vars}}, list(outlier = ~ case_when( .x < {{ vars} }_iqr_outlier_1  | .x > {{ vars }}_iqr_outlier_3  ~ 0,
                                                        TRUE ~ 1) ) ))


# test function
remove_outliers(mtcars, group = cyl, vars = c("disp", "mpg")) %>% 
     glimpse()

I use mutate() + across() to create column variables representing the 1st and 3rd quartiles, find the Inter-Quartile-Ratio, and then create variables that identify var +/- 1.5*(IQR). In the next mutate() + across() call, I want to identify observations where that fall between dynamically named variables created in the previous few lines of code (i.e, {{ vars }}_iqr_outlier_1).

But, I can't figure out how to call those previously created variables, dynamically and I haven't seen anything on this.

Any help would be greatly appreciated.

kaseyzapatka
  • 149
  • 2
  • 9
  • You're not going to be able to use the `{{}}` to build new variable names. You're going to need to do extra work to build column names as strings and then use the `.data` pronoun to reference those columns. Pretty much the same strategy as used on this answer: https://stackoverflow.com/questions/71014763/how-to-pass-a-dynamic-column-name-in-a-pipe-in-custom-function-in-r/71014993#71014993 – MrFlick Mar 04 '22 at 03:16
  • Hmm.. this seems really complicated given I have so many vars. I'm surprised there isn't a way to call newly created dynamic variables on the right hand side of the = . – kaseyzapatka Mar 04 '22 at 05:03
  • 1
    If you post workable code you're more likely to get a helpful answer. Your function definition doesn't have a `group` parameter. `mtcars` can't be grouped by `tenure_status`. – Michael Dewar Mar 04 '22 at 08:02
  • oops fixed that one parameter – kaseyzapatka Mar 04 '22 at 13:25

0 Answers0