Getting a summary by many groups and overall using dplyr

Question

In Getting summary by group and overall using tidyverse, I asked a related question that extended the original problem. What if I have two group variables and I want to summarize one of them. What about 3? 4? An answer is also provided in the post.

I am posting it here as a separate post so that other users may more easily find the answer.

score 0 · Answer 1 · answered Jan 25 '22 at 22:11

It may be more easily done with groupingsets, cube, rollup from data.table

library(data.table)
cube(as.data.table(dsn), j = mean(age),
     by = c("sex", "obese"))[!is.na(obese)][order(sex)]
  sex obese       V1
1:    F FALSE 23.98792
2:    F  TRUE 23.98330
3:    M FALSE 20.00341
4:    M  TRUE 19.97381
5: <NA> FALSE 21.74211
6: <NA>  TRUE 22.29040

-checking with OP's output

library(dplyr)
find_summary <- function(df_group){
  df_group %>% 
summarize(mean_age = mean(age))  #add other dplyr verbs here as needed like arrange or mutate
}
bind_rows(
     find_summary(group_by(dsn, sex, obese)),
     find_summary(group_by(dsn, obese))
     ) %>% as.data.frame

   sex obese mean_age
1    F FALSE 23.98792
2    F  TRUE 23.98330
3    M FALSE 20.00341
4    M  TRUE 19.97381
5 <NA> FALSE 21.74211
6 <NA>  TRUE 22.29040

data

set.seed(3243242)
dsn <- tibble(
  obese = sample(c(TRUE, FALSE), size=100, replace = TRUE),
  sex = sample(c("M", "F"), size=100, replace=TRUE),
                  age = rnorm(n=100, mean=20 + 4*(sex=="F"), sd=0.1)

Getting a summary by many groups and overall using dplyr

1 Answers1

data