0

If I have this dataframe:

(df=as.data.frame(dput(structure(list(sex = structure(c(1L, 1L, 2L, 2L), .Label = c("boy", "girl"), class = "factor"), age = c(52L, 58L, 40L, 62L), bmi = c(25L, 23L, 30L, 26L), chol = c(187L, 220L, 190L, 204L),sed = c(180L, 120L, 155L, 124L)), .Names = c("sex", "age", "b1", "b2","b100"), row.names = c(NA, -4L), class = "data.frame"))))

I want to group_by sex then apply différents functions in summarise() to differents columns:

calculate the "mean" of the column "age" (ONLY)

calculate the "sd" of all columns whose names begin with "b": column b1,b2...

I tried :

df%>%group_by(sex)%>%summarise_at(.vars = c("age",names(df)[substr(names(df),1,1)=="b"]),
                                            .funs = c(mean="mean", sd="sd"))

but It apply "mean" and "sd" functions to all columns, exactly what i want to avoid to.

the result that i want is a column: mean_age and others columns: sd_b1, sd_b2...

Is that possible with dplyr? or i must do it in two steps like:

df%>%group_by(sex)%>%summarise(mean_age=mean(age))

df%>%group_by(sex)%>%summarise_at(.vars = c(names(df)[substr(names(df),1,1)=="b"]),
                                            .funs = c(sd="sd"))

thank you

DD chen
  • 169
  • 11
  • Please edit your question and fix your code, as it is producing errors. It might help to also include your expected output (and how this output is incorrect). – r2evans Jun 03 '19 at 16:27
  • @r2evans I modify my question by deleting the code that i had tried , because it was totally no sense. Maybe it's less confusing now. – DD chen Jun 03 '19 at 16:32
  • But now it's unclear what the difficulty is: it seems like a textbook case of `group_by` + `summarize_at`. – divibisan Jun 03 '19 at 16:34
  • @divibisan I hope it's more clear. – DD chen Jun 03 '19 at 16:41
  • Is the problem that you're not selecting the variables to summarize correctly? The whole point of using these scoped dplyr functions is that you can use `?select_helpers` to select variables. So just: `vars(starts_with('b'))`. Take a look at `?summarize_at` and `?select_helpers` – divibisan Jun 03 '19 at 16:43
  • @divibisan thank you, so in summarize_at(), the action (sum or mean) will be applied to all column selected. i can't for example apply sum to column 1-10 then mean to column 11-20? – DD chen Jun 04 '19 at 08:40
  • Sure, just use different `summarize_at` statements for each distinct group of variables – divibisan Jun 04 '19 at 14:42

0 Answers0