2

I am trying to calculate row percentages by demographics of various score levels--in my data, that would be what % of white people (or % of black people, or % male, or % who have education level 2, and so on) have a score of 0 (or 1, 2, or 3)--and then use that to create a big plot.

So in my example data below, 8.33% of race == 1 (which is white) have a score of 0, 25% have a score of 1, 25% have a score of 2, and 41.67% have a score of 3.

Then the ultimate end goal would be to get some type of bar plot where the 4 levels of 'score' are across the x axis, and the various comparisons of demographics run down the y axis. Something that looks visually sort of like this, but with the levels of 'score' across the top instead of education levels: enter image description here .

I already have some code to make the actual figure, which I've done in other instances but with externally/already-calculated percentages:

ggplot(data, aes(x = percent, y = category, fill = group)) +
  geom_col(orientation = "y", width = .9) +
  facet_grid(group~score_var, 
             scales = "free_y", space = "free_y") +
  labs(title = "Demographic breakdown of 'Score'") +
  theme_bw()

I am just struggling to figure out the best way to calculate these row percentages, presumably using group_by() and summarize and then storing or configuring them in a way that they can be plotted. Thank you.

d <- structure(list(race = c(1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 3, 3, 1, 
1, 2, 2, 3, 3, 1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 
3, 3), gender = c(0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 
0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1
), education = c(1, 3, 3, 2, 1, 3, 2, 3, 4, 4, 2, 3, 3, 2, 3, 
4, 1, 3, 1, 3, 3, 2, 1, 3, 2, 3, 4, 4, 2, 3, 3, 2, 3, 4, 1, 3
), score = c(1, 2, 2, 1, 2, 3, 3, 2, 0, 0, 1, 2, 1, 3, 0, 0, 
3, 3, 3, 3, 3, 3, 3, 3, 2, 1, 2, 3, 1, 3, 3, 0, 1, 2, 2, 0)), row.names = c(NA, 
-36L), spec = structure(list(cols = list(race = structure(list(), class = c("collector_double", 
"collector")), gender = structure(list(), class = c("collector_double", 
"collector")), education = structure(list(), class = c("collector_double", 
"collector")), score = structure(list(), class = c("collector_double", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), delim = ","), class = "col_spec"), problems = <pointer: 0x000001bd978b0df0>, class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))
a_todd12
  • 449
  • 2
  • 12

2 Answers2

1

This may get you started:

library(dplyr)
library(ggplot2)
prop <- data %>% 
    mutate(race = factor(race, levels = c(1, 2, 3), labels = c("White", "Black", "Others"))) %>% 
    group_by(race) %>% 
    mutate(race_n  = n()) %>% 
    group_by(race, score) %>% 
    summarise(percent = round(100*n()/race_n[1], 1))

prop %>% 
    ggplot(aes(x = percent, y = score, fill = race)) +
    geom_col(orientation = "y", width = .9) +
    geom_text(aes(label = percent), hjust = 1)+
    facet_grid(~race) +
    labs(title = "Demographic breakdown of 'Score'") +
    theme_bw() 

enter image description here

Edit

To put all characters together, you can get a bigger graph:

df <- data %>% mutate(
        gender = factor(2-gender), 
        race = factor(race), 
        education = factor(education)) %>%
    pivot_longer(!score, names_to = "character", values_to = "levels")

df %>% group_by(character, levels) %>% 
    mutate(group_n  = n()) %>% 
    group_by(character, levels, score) %>% 
    summarise(percent = round(100*n()/group_n[1], 1)) %>% 
    ggplot(aes(x = percent, y = score, fill = character)) +
    geom_col(orientation = "y", width = .9) +
    geom_text(aes(label = percent), hjust = 1)+
    facet_grid(character ~ levels) +
    labs(title = "Demographic breakdown of 'Score'") +
    theme_bw() 

please note: I have changed the code for gender.

enter image description here

Zhiqiang Wang
  • 6,206
  • 2
  • 13
  • 27
  • Thanks @Zhiqiang Wang, this is actually basically close to where I ended up with my first attempt--where I really got stuck was in trying to repeat this process for each of the demographic vars and combining them all into one data frame that could be piped into ggplot. But I appreciate this very much! – a_todd12 Feb 07 '23 at 02:20
  • The method would be the same. Just set-up data for it. – Zhiqiang Wang Feb 07 '23 at 04:35
  • Thanks for this--it's close but not exactly how I was thinking of it. I will tinker with it and update my post if I figure it out exactly. – a_todd12 Feb 07 '23 at 14:49
0

Taking inspiration from @Zhiqiang Wang's excellent first pass at this, I finally figured out a solution. I still need to change the order of the labels (to put the education levels in order, and move the race variables to the top of the figure) but this is basically what I was envisioning.

d_test <- d %>% mutate(
        gender = factor(2-gender), 
        race = factor(race), 
        education = factor(education)) %>%
    pivot_longer(!score, names_to = "group", values_to = "levels")


d_test <- d_test %>% group_by(group, levels) %>% 
    mutate(group_n  = n()) %>% 
    group_by(group, levels, score) %>% 
    summarise(percent = round(100*n()/group_n[1], 1))

d_test <- d_test %>% 
  mutate(var = case_when(group == "gender" & levels == 1 ~ "female",
                         group == "gender" & levels == 2 ~ "male",
                         group == "race" & levels == 1 ~ "white",
                         group == "race" & levels == 2 ~ "black",
                         group == "race" & levels == 3 ~ "hispanic",
                         group == "education" & levels == 1 ~ "dropout HS",
                         group == "education" & levels == 2 ~ "grad HS",
                         group == "education" & levels == 3 ~ "some coll",
                         group == "education" & levels == 4 ~ "grad coll"))

ggplot(d_test, aes(x = percent, y = var, fill = group)) +
  geom_col(orientation = "y", width = .9) +
  facet_grid(group ~ score,
               scales = "free_y", space = "free_y") +
  labs(title = "Demographic breakdown of 'Score'",
         y = "",
         x = "Percent") +
  theme_minimal() +
  theme(legend.position = "none",
        strip.text.y = element_blank())

enter image description here

a_todd12
  • 449
  • 2
  • 12