Programming Functions: Accessing variables of dataframes created within a function

Question

I wrote a function that further processes the input dataframe, say excludes participants with a valuevariable > 3. For example:

function.example <- function(dataframe, valuevariable, conditionvariable) {

processed.dataframe <- dataframe %>% filter(valuevariable > 3)
....#more code
}

and you call the function like this: function.example(df, df$latency, df$group) Now, say I want to access the conditionvariable of processed.dataframe the function Typically, you´d do that with processed.dataframe$group.

For example:

function.example <- function(dataframe, valuevariable, conditionvariable) {

processed.dataframe <- dataframe %>% filter(valuevariable > 3)

#say I now want to make sure the conditionvariable is a factor
processed.dataframe$conditionvariable <- as.factor(processed.dataframe$conditionvariable)
}

The problem is that I cannot call the group variable by $conditionvariable, I need to write $group. Now, if I have diverse datasets, the conditionvariable will not be called group every time. Hence, I´m looking for a way to access processed.dataframe$[name conditionvariable] regardless of the way the conditionvariable is called. Does anyone know how to do that?

Possible duplicate: https://stackoverflow.com/questions/18222286/dynamically-select-data-frame-columns-using-and-a-character-value — starja, Aug 23 '20 at 11:19

score 1 · Accepted Answer · answered Aug 23 '20 at 12:04

For verbs that do some variation of filter() or mutate(), you have two choices. If you want your function to take the variable name as a character input, you use .data[[var]].

df <- data.frame(a = factor(sample(LETTERS, 100, replace = TRUE)),
                 x = runif(100),
                 y = rnorm(100),
                 z = rexp(100))
filter_top_half <- function(df, var) {
   df %>% filter(.data[[var]] >= median(.data[[var]]))
}
df %>% filter_top_half(var = "x") %>% tibble()
# A tibble: 50 x 4
   a         x       y     z
   <fct> <dbl>   <dbl> <dbl>
 1 U     0.790  0.424  0.894
 2 D     0.621 -0.0769 0.640
 3 X     0.694 -0.290  0.168
 4 L     0.814 -1.32   0.933
 5 R     0.823 -1.80   0.588
 6 R     0.742  1.10   0.153
 7 W     0.849 -0.577  1.48 
 8 C     0.851  1.32   0.353
 9 A     0.727  0.662  2.03 
10 X     0.615  0.441  1.27 
# ... with 40 more rows

You can also just pass the desired variable without quotes. Then you need to use {{var}}. For dplyr verbs that use the tidy select procedure, you do not need to do anything special. These verbs include select, and group_by.

summary_stats <- function(df, var, group = NULL) {
  df %>% group_by(group) %>%
         summarise(summary = paste(summary({{var}}), collapse = ",")) %>%
         separate(summary, c("Min", "1st_QT", "Median", "Mean", "3rd_QT", "Max"), 
                  sep = ",", convert = TRUE)
}
df %>% summary_stats(y)
        Min     1st_QT   Median      Mean    3rd_QT      Max
1 -1.851692 -0.4026616 0.137691 0.1159647 0.7375944 2.284116

Programming Functions: Accessing variables of dataframes created within a function

1 Answers1