0

I have a data frame in which I would I would like to compute some extra column as a function of the existing columns, but want to specify both each new column name and the function dynamically. I have a vector of column names that are already in the dataframe df_daily:

DAILY_QUESTIONS  <- c("Q1_Daily",  "Q2_Daily",   "Q3_Daily",  "Q4_Daily",  "Q5_Daily")

The rows of the dataframe have responses to each question from each user each time they answer the questionnaire, as well as a column with the number of days since the user first answered the questionnaire (i.e. Days_From_First_Use = 0 on the very first use, = 1 if it is used the next day etc.). I want to average the responses to these questions by Days_From_First_Use . I start by by grouping my dataframe by Days_From_First_Use:

df_test <- df_daily %>%
             group_by(Days_From_First_Use)

and then try averaging the responses in a loop as follows:

for(i in 1:5){
   df_test <- df_test %>%
                 mutate(!! paste0('Avg_Score_', DAILY_QUESTIONS[i]) := 
                 paste0('mean(', DAILY_QUESTIONS[i], ')'))
}

Unfortunately, while my new variable names are correct ("Avg_Score_Q1_Daily", "Avg_Score_Q2_Daily", "Avg_Score_Q3_Daily", "Avg_Score_Q4_Daily", "Avg_Score_Q5_Daily"), my answers are not: every row in my data frame has a string such as "mean(Q1_Daily)" in the relevant column .

So I'm clearly doing something wrong - what do I need to do fix this and get the average score across all users on each day?

Sincerely and with many thanks in advance

Thomas Philips

Thomas Philips
  • 935
  • 2
  • 11
  • 22
  • Have you seen Tom Roth's answer to the similarly named FAQ, [How to use dynamic variable names in dplyr?](https://stackoverflow.com/a/49311813/903061) It shows examples using variables on both sides of `:=`. – Gregor Thomas Dec 07 '20 at 02:08
  • I did, but didn't fully understand it. I'll go back and review it again and see if it offers a fix for my problem. – Thomas Philips Dec 07 '20 at 02:09
  • Actually, looking more, I think the more current approach is using `{{` as in [Ronak's answer](https://stackoverflow.com/a/59224230/903061). You can also see examples under the sentence *"If you want to use the names of variables in the output, you can use glue syntax in conjunction with `:=`"* in the [Programming with dplyr vignette](https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html) – Gregor Thomas Dec 07 '20 at 02:11
  • No luck: I create a function `my_mutate_mean <- function(df, question) { df %>% mutate( "Avg_score_{{question}}" := mean({{ question }}) ) }' and then call it in a loop 'for(i in 1:5){ df_test <- my_mutate_mean(df_test, DAILY_QUESTIONS[i]) }` but i only get a column of NAs in variable named Avg_Score_DAILY_QUESTIONS[i] – Thomas Philips Dec 07 '20 at 02:39
  • Could you share enough data to make your example reproducible so we can test? – Gregor Thomas Dec 07 '20 at 02:43
  • Though, perhaps you can skip the loop and use `df %>% mutate(across(all_of(DAILY_QUESTIONS), mean, .names = "Avg_score_{{.col}}"))` – Gregor Thomas Dec 07 '20 at 02:45
  • It will take me a little time, but I will create a toy data set and upload it. – Thomas Philips Dec 07 '20 at 02:54

1 Answers1

1

I took a somewhat different approach, using summarize(across(...)) after group_by(Days_From_First_Use) I achieve the dynamic names by using rename_with and a custom function that replaces (starts with)"Q" with "Avg_Score_Q"

library(dplyr, warn.conflicts = FALSE)

# fake data -- 30 normalized "responses" from 0 to 2 days from first use to 5 questions
DAILY_QUESTIONS  <- c("Q1_Daily",  "Q2_Daily",   "Q3_Daily",  "Q4_Daily",  "Q5_Daily")
df_daily <- as.data.frame(do.call('cbind', lapply(1:5, function(i) rnorm(30, i))))
colnames(df_daily) <- DAILY_QUESTIONS
df_daily$Days_From_First_Use <- floor(runif(30, 0, 3))

df_test <- df_daily %>%
  group_by(Days_From_First_Use) %>%
  summarize(across(.fns = mean)) %>%
  rename_with(.fn  = function(x) gsub("^Q","Avg_Score_Q",x))
#> `summarise()` ungrouping output (override with `.groups` argument)

df_test  
#> # A tibble: 3 x 6
#>   Days_From_First… Avg_Score_Q1_Da… Avg_Score_Q2_Da… Avg_Score_Q3_Da…
#>              <dbl>            <dbl>            <dbl>            <dbl>
#> 1                0            1.26              1.75             3.02
#> 2                1            0.966             2.14             3.48
#> 3                2            1.08              2.45             3.01
#> # … with 2 more variables: Avg_Score_Q4_Daily <dbl>, Avg_Score_Q5_Daily <dbl>

Created on 2020-12-06 by the reprex package (v0.3.0)

Andrew Brown
  • 1,045
  • 6
  • 13
  • You can probably use the `.names` arg of `across()` instead of `rename`: `summarize(across(all_of(DAILY_QUESTIONS), mean, .names = "Avg_score_{{.col}}"))` – Gregor Thomas Dec 07 '20 at 03:13
  • 2
    Oops, had an extra `{` - try `df_daily %>% group_by(Days_From_First_Use) %>% summarize(across(.fns = mean, .names = "Avg_Score_{.col}"))`, works great! – Gregor Thomas Dec 07 '20 at 03:28
  • Thats good, yeah -- all these methods have a ton of extensible options! – Andrew Brown Dec 07 '20 at 03:55
  • Perfect - thanks a million for the assistance! Removing the additional braces did the trick. I used mutate instead of summarize so as to add new columns to the existing dataframe instead of creating a new dataframe. Here is the final version of my code: `df_daily <- df_daily %>% group_by(Days_From_First_Use) %>% mutate(across(all_of(DAILY_QUESTIONS), mean, .names = "Avg_Score_{.col}"))` – Thomas Philips Dec 07 '20 at 14:16
  • One final enhancement: the prefix does not have to be hardcoded - I can set it using a variable. I start by writing `MY_PREFIX <- "abc"` followed by `df_daily <- df_daily %>% group_by(Days_From_First_Use) %>% mutate(across(all_of(DAILY_QUESTIONS), mean, .names = paste0(MY_PREFIX, "{.col}")))` – Thomas Philips Dec 07 '20 at 18:48