Better way to summarize percentages by subgroup using dplyr?

Question

After consulting similar questions on SO, like here I finally have the output I want, but I can't help wonder if there is a better way to get there. In addition, I am wondering if there is a way to use pipe operators to chain the last step which eliminates repeats of manager and title combinations.

Reproducible example:

library(dplyr)

# Sample data frame
employee = LETTERS[1:18]
manager  = c(rep("Tom", 3), rep("Sue", 4), rep("Mike", 4), rep("Jack", 7))  
title    = c(rep("Entry", 2), rep("Mid", 3), rep("Junior", 7), rep("Senior", 6))

mydata <- data.frame(employee, manager, title)

# Code gives me output I want, but wondering if there is a better way
org2 <-  mydata %>%
  group_by(manager, title) %>%
  mutate(title_count = n()) %>%  # Total number of people with given title by manager
  ungroup() %>%
  group_by(manager) %>%          # Total number of people in manager's group
  mutate(mgr_total = n()) %>%
  group_by(title, add = TRUE) %>%
  mutate(title_pctg = round(title_count/mgr_total*100, 1)) %>%  # Percent of people with given title by manager
  select(-employee)

# Remove duplicates of manager and title to summarize data wanted
org2 <- org2[!duplicated(org2[2:4]), ]

arrange(org2, manager, title)

# A tibble: 7 x 5
# Groups:   manager, title [7]
#  manager  title title_count mgr_total title_pctg
#   <fctr> <fctr>       <int>     <int>      <dbl>
#1    Jack Junior           1         7       14.3
#2    Jack Senior           6         7       85.7
#3    Mike Junior           4         4      100.0
#4     Sue Junior           2         4       50.0
#5     Sue    Mid           2         4       50.0
#6     Tom  Entry           2         3       66.7
#7     Tom    Mid           1         3       33.3

Thanks in advance for the thoughts and help.

score 4 · Accepted Answer · answered Nov 27 '17 at 16:44

You can simplify it as the following by switching the order of group_by (i.e. group by manager first then manger + title instead of the other way);

mydata %>% 
    group_by(manager) %>% 
    mutate(mgr_count = n()) %>% 
    group_by(title, mgr_count, add=TRUE) %>% 
    summarise(
        title_count = n(), 
        title_pctg = round(title_count / first(mgr_count) * 100, 1)
    )

# A tibble: 7 x 5
# Groups:   manager, title [?]
#  manager  title mgr_count title_count title_pctg
#   <fctr> <fctr>     <int>       <int>      <dbl>
#1    Jack Junior         7           1       14.3
#2    Jack Senior         7           6       85.7
#3    Mike Junior         4           4      100.0
#4     Sue Junior         4           2       50.0
#5     Sue    Mid         4           2       50.0
#6     Tom  Entry         3           2       66.7
#7     Tom    Mid         3           1       33.3

Excellent. I wasn't aware of the function "first" and how handy it comes in here. — DaveM, Nov 27 '17 at 17:09

Better way to summarize percentages by subgroup using dplyr?

1 Answers1