Using dynamic names with dplyr

Question

I have a data frame in which I would I would like to compute some extra column as a function of the existing columns, but want to specify both each new column name and the function dynamically. I have a vector of column names that are already in the dataframe df_daily:

DAILY_QUESTIONS  <- c("Q1_Daily",  "Q2_Daily",   "Q3_Daily",  "Q4_Daily",  "Q5_Daily")

The rows of the dataframe have responses to each question from each user each time they answer the questionnaire, as well as a column with the number of days since the user first answered the questionnaire (i.e. Days_From_First_Use = 0 on the very first use, = 1 if it is used the next day etc.). I want to average the responses to these questions by Days_From_First_Use . I start by by grouping my dataframe by Days_From_First_Use:

df_test <- df_daily %>%
             group_by(Days_From_First_Use)

and then try averaging the responses in a loop as follows:

for(i in 1:5){
   df_test <- df_test %>%
                 mutate(!! paste0('Avg_Score_', DAILY_QUESTIONS[i]) := 
                 paste0('mean(', DAILY_QUESTIONS[i], ')'))
}

Unfortunately, while my new variable names are correct ("Avg_Score_Q1_Daily", "Avg_Score_Q2_Daily", "Avg_Score_Q3_Daily", "Avg_Score_Q4_Daily", "Avg_Score_Q5_Daily"), my answers are not: every row in my data frame has a string such as "mean(Q1_Daily)" in the relevant column .

So I'm clearly doing something wrong - what do I need to do fix this and get the average score across all users on each day?

Sincerely and with many thanks in advance

Thomas Philips

Have you seen Tom Roth's answer to the similarly named FAQ, [How to use dynamic variable names in dplyr?](https://stackoverflow.com/a/49311813/903061) It shows examples using variables on both sides of `:=`. — Gregor Thomas, Dec 07 '20 at 02:08
I did, but didn't fully understand it. I'll go back and review it again and see if it offers a fix for my problem. — Thomas Philips, Dec 07 '20 at 02:09
Actually, looking more, I think the more current approach is using `{{` as in [Ronak's answer](https://stackoverflow.com/a/59224230/903061). You can also see examples under the sentence *"If you want to use the names of variables in the output, you can use glue syntax in conjunction with `:=`"* in the [Programming with dplyr vignette](https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html) — Gregor Thomas, Dec 07 '20 at 02:11
No luck: I create a function `my_mutate_mean <- function(df, question) { df %>% mutate( "Avg_score_{{question}}" := mean({{ question }}) ) }' and then call it in a loop 'for(i in 1:5){ df_test <- my_mutate_mean(df_test, DAILY_QUESTIONS[i]) }` but i only get a column of NAs in variable named Avg_Score_DAILY_QUESTIONS[i] — Thomas Philips, Dec 07 '20 at 02:39
Could you share enough data to make your example reproducible so we can test? — Gregor Thomas, Dec 07 '20 at 02:43
Though, perhaps you can skip the loop and use `df %>% mutate(across(all_of(DAILY_QUESTIONS), mean, .names = "Avg_score_{{.col}}"))` — Gregor Thomas, Dec 07 '20 at 02:45
It will take me a little time, but I will create a toy data set and upload it. — Thomas Philips, Dec 07 '20 at 02:54

score 1 · Accepted Answer · answered Dec 07 '20 at 03:04

1

I took a somewhat different approach, using summarize(across(...)) after group_by(Days_From_First_Use) I achieve the dynamic names by using rename_with and a custom function that replaces (starts with)"Q" with "Avg_Score_Q"

library(dplyr, warn.conflicts = FALSE)

# fake data -- 30 normalized "responses" from 0 to 2 days from first use to 5 questions
DAILY_QUESTIONS  <- c("Q1_Daily",  "Q2_Daily",   "Q3_Daily",  "Q4_Daily",  "Q5_Daily")
df_daily <- as.data.frame(do.call('cbind', lapply(1:5, function(i) rnorm(30, i))))
colnames(df_daily) <- DAILY_QUESTIONS
df_daily$Days_From_First_Use <- floor(runif(30, 0, 3))

df_test <- df_daily %>%
  group_by(Days_From_First_Use) %>%
  summarize(across(.fns = mean)) %>%
  rename_with(.fn  = function(x) gsub("^Q","Avg_Score_Q",x))
#> `summarise()` ungrouping output (override with `.groups` argument)

df_test  
#> # A tibble: 3 x 6
#>   Days_From_First… Avg_Score_Q1_Da… Avg_Score_Q2_Da… Avg_Score_Q3_Da…
#>              <dbl>            <dbl>            <dbl>            <dbl>
#> 1                0            1.26              1.75             3.02
#> 2                1            0.966             2.14             3.48
#> 3                2            1.08              2.45             3.01
#> # … with 2 more variables: Avg_Score_Q4_Daily <dbl>, Avg_Score_Q5_Daily <dbl>

^{Created on 2020-12-06 by the reprex package (v0.3.0)}

answered Dec 07 '20 at 03:04

Andrew Brown

1,045
6
13

You can probably use the `.names` arg of `across()` instead of `rename`: `summarize(across(all_of(DAILY_QUESTIONS), mean, .names = "Avg_score_{{.col}}"))` – Gregor Thomas Dec 07 '20 at 03:13
2

Oops, had an extra `{` - try `df_daily %>% group_by(Days_From_First_Use) %>% summarize(across(.fns = mean, .names = "Avg_Score_{.col}"))`, works great! – Gregor Thomas Dec 07 '20 at 03:28
Thats good, yeah -- all these methods have a ton of extensible options! – Andrew Brown Dec 07 '20 at 03:55
Perfect - thanks a million for the assistance! Removing the additional braces did the trick. I used mutate instead of summarize so as to add new columns to the existing dataframe instead of creating a new dataframe. Here is the final version of my code: `df_daily <- df_daily %>% group_by(Days_From_First_Use) %>% mutate(across(all_of(DAILY_QUESTIONS), mean, .names = "Avg_Score_{.col}"))` – Thomas Philips Dec 07 '20 at 14:16
One final enhancement: the prefix does not have to be hardcoded - I can set it using a variable. I start by writing `MY_PREFIX <- "abc"` followed by `df_daily <- df_daily %>% group_by(Days_From_First_Use) %>% mutate(across(all_of(DAILY_QUESTIONS), mean, .names = paste0(MY_PREFIX, "{.col}")))` – Thomas Philips Dec 07 '20 at 18:48

Using dynamic names with dplyr

1 Answers1