After consulting similar questions on SO, like here I finally have the output I want, but I can't help wonder if there is a better way to get there. In addition, I am wondering if there is a way to use pipe operators to chain the last step which eliminates repeats of manager and title combinations.
Reproducible example:
library(dplyr)
# Sample data frame
employee = LETTERS[1:18]
manager = c(rep("Tom", 3), rep("Sue", 4), rep("Mike", 4), rep("Jack", 7))
title = c(rep("Entry", 2), rep("Mid", 3), rep("Junior", 7), rep("Senior", 6))
mydata <- data.frame(employee, manager, title)
# Code gives me output I want, but wondering if there is a better way
org2 <- mydata %>%
group_by(manager, title) %>%
mutate(title_count = n()) %>% # Total number of people with given title by manager
ungroup() %>%
group_by(manager) %>% # Total number of people in manager's group
mutate(mgr_total = n()) %>%
group_by(title, add = TRUE) %>%
mutate(title_pctg = round(title_count/mgr_total*100, 1)) %>% # Percent of people with given title by manager
select(-employee)
# Remove duplicates of manager and title to summarize data wanted
org2 <- org2[!duplicated(org2[2:4]), ]
arrange(org2, manager, title)
# A tibble: 7 x 5
# Groups: manager, title [7]
# manager title title_count mgr_total title_pctg
# <fctr> <fctr> <int> <int> <dbl>
#1 Jack Junior 1 7 14.3
#2 Jack Senior 6 7 85.7
#3 Mike Junior 4 4 100.0
#4 Sue Junior 2 4 50.0
#5 Sue Mid 2 4 50.0
#6 Tom Entry 2 3 66.7
#7 Tom Mid 1 3 33.3
Thanks in advance for the thoughts and help.