R dplyr getting percentage after multiple groupby columns

Question

I have done a group by using three columns now I want to get percentages for each group:

   GENDER      preference                           option num_users
   <fct>       <fct>                               <int>     <int>
 1 Female      Sweet                                   1       136
 2 Female      Sweet                                   2        28
 3 Female      Don't Know                              1        18
 4 Female      Don't Know                              2         2
 5 Female      Medium Spicy                            1        62
 6 Female      Medium Spicy                            2         6
 7 Female      Spicy                                   1        84
 8 Female      Spicy                                   2        20
 9 Female      Hot                                     1        35
10 Female      Hot                                     2         5
# ... with 17 more rows

I managed to use transmute when I only use Gender and Option to group but after adding the preference transmute doesn't work.

Here is my approach without preference column in the group by:

grouped_df <- df %>% 
group_by(GENDER, option) %>% 
summarize(num_users = n()) %>% 
spread(GENDER, num_users) %>% 
ungroup() %>% 
transmute(option_id = option, 
  female_percent = Female/(Female + Male), 
  male_percent = Male / (Female + Male)) %>% 
mutate(female_percent = round(100 * female_percent), 
  male_percent = round(100 * male_percent))

How can I use preference in the above approach?

Could you please provide the data in a [reproducible](https://stackoverflow.com/q/5963269/1422451) format? — Hack-R, Jul 10 '18 at 13:32
What are you trying to find the `gender` percent of if you are grouping by `gender` itself? Please add current result and desired result. — phil_t, Jul 10 '18 at 13:46

score 1 · Answer 1 · answered Jul 10 '18 at 14:22

I think this is what you want, I had to create a data frame from scratch since you didn't provide the code to do so but the idea is the same.

set.seed(1)

df <- data.frame(gender = rep(c("Male", "Female"), each = 10),
           preference = rep(c("Sweet", "Don't Know", "Medium Spicy", "Spicy", "Hot"), 2),
           option = rep(c(1, 2), 2),
           num_users = sample(1:150, 20))

df %>% 
  group_by(gender, option) %>% 
  mutate(perc = prop.table(num_users) * 100) %>%
  select(-num_users) %>% 
  spread(preference, perc)

# A tibble: 4 x 7
# Groups:   gender, option [4]
  gender option `Don't Know`   Hot `Medium Spicy` Spicy Sweet
  <fct>   <dbl>        <dbl> <dbl>          <dbl> <dbl> <dbl>
1 Female      1        22.8  24.7            33.6  12    6.82
2 Female      2         6.58 26.8            34.7  13.9 17.9 
3 Male        1        35.9   7.85           22.3  23.6 10.5 
4 Male        2        13.2   2.12           22.4  31.5 30.8

R dplyr getting percentage after multiple groupby columns

1 Answers1