I create a series of dynamically named variables within a function and then want to create new variables based the ones previously created. Specifically, I'm trying to write a function to remove outliers from a series of variables in a data frame. Here is the code so far, with the last line testing it using the mtcars data frame:
# create function
remove_outliers <- function(data, vars, group_vars) {
data %>%
group_by( across( {{ group_vars }} ) ) %>%
mutate(across({{vars}}, list(quartile3 = ~ quantile(.x, 0.75, na.rm = TRUE),
quartile1 = ~ quantile(.x, 0.25, na.rm = TRUE),
iqr = ~ IQR(.x, na.rm = TRUE),
iqr_outlier_1 = ~ ( quantile(.x, 0.25, na.rm = TRUE) - (1.5 * IQR(.x, na.rm = TRUE)) ),
iqr_outlier_3 = ~ ( quantile(.x, 0.75, na.rm = TRUE) + (1.5 * IQR(.x, na.rm = TRUE)) ) ))) %>%
ungroup() %>%
mutate(across({{vars}}, list(outlier = ~ case_when( .x < {{ vars} }_iqr_outlier_1 | .x > {{ vars }}_iqr_outlier_3 ~ 0,
TRUE ~ 1) ) ))
# test function
remove_outliers(mtcars, group = cyl, vars = c("disp", "mpg")) %>%
glimpse()
I use mutate()
+ across()
to create column variables representing the 1st and 3rd quartiles, find the Inter-Quartile-Ratio, and then create variables that identify var +/- 1.5*(IQR). In the next mutate()
+ across()
call, I want to identify observations where that fall between dynamically named variables created in the previous few lines of code (i.e, {{ vars }}_iqr_outlier_1
).
But, I can't figure out how to call those previously created variables, dynamically and I haven't seen anything on this.
Any help would be greatly appreciated.