0

I've tried looking at old threads unsuccessfully. I'm trying to plot the share of male legislators in different parliamentary sessions in a histogram.

This is my code, which works but shows the amount of legislators (NOT the share). How can I plot the share? Thanks!

    mergedf %>%
ggplot( aes(x = session, fill = factor(sex))) +
    geom_histogram (binwidth = 0.5)+
theme_minimal()+
  theme(legend.position ="bottom")+
  labs(title = "Share of male legislators by session", x= "Session", y = "Share of legislators", 
       fill ="sex")

Edit: I get the share of legislators with this table, but I don't know how to integrate it in the histogram.

mergedf %>% 
  tabyl (session, sex) %>% 
  adorn_percentages() %>% 
  adorn_pct_formatting ()
stefan
  • 90,330
  • 6
  • 25
  • 51
dilly
  • 63
  • 1
  • 7
  • 2
    It would be easier to help you if you provide [a minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) including a snippet of your data or some fake data shared via `dput()`. – stefan Sep 20 '22 at 07:56
  • sorry, good point. Here you go: – dilly Sep 20 '22 at 10:56
  • `sessions <- c( 1, 2, 3, 4, 5, 2, 3, 4, 2) sex <- c ("female", "female", "female", "male", "female", "female", "female", "male", "male")` – dilly Sep 20 '22 at 10:56

2 Answers2

0

One option would be to use some dplyr verbs to compute the counts and percentages which could then be displayed as a barchart (a histogram is something different) via geom_col like so:

mergedf <- data.frame(
  sessions = c( 1, 2, 3, 4, 5, 2, 3, 4, 2),
  sex = c ("female", "female", "female", "male", "female", "female", "female", "male", "male")
)

library(dplyr)
library(ggplot2)

mergedf %>%
  group_by(sessions, sex) %>% 
  summarise(n = n()) %>%
  mutate(pct = n / sum(n)) %>%
  ggplot( aes(x = factor(sessions), y = pct, fill = sex)) +
  geom_col(width = .6)+
  theme_minimal()+
  theme(legend.position ="bottom")+
  labs(title = "Share of male legislators by session", x= "Session", y = "Share of legislators", 
       fill ="sex")
#> `summarise()` has grouped output by 'sessions'. You can override using the
#> `.groups` argument.

stefan
  • 90,330
  • 6
  • 25
  • 51
0

You simply need to specify position="fill" in your geom_histogram parameters:

library(ggplot2)
mergedf <- data.frame(
  session = c( 1, 2, 3, 4, 5, 2, 3, 4, 2),
  sex = c ("female", "female", "female", "male", "female", "female", "female", "male", "male")
)

ggplot(mergedf, aes(x = session, fill = factor(sex))) +
  geom_histogram (binwidth = 0.5, position = "fill") +  # <- HERE
  theme_minimal() +
  theme(legend.position ="bottom") +
  labs(title = "Share of male legislators by session", 
       x= "Session", y = "Share of legislators", fill ="sex")

Technically, you're not really building a histogram (binned distribution of counts) but a barplot, so you could alternatively use the geom_bar geom, with the same format:

ggplot(mergedf, aes(x = session, fill = factor(sex))) +
  geom_bar(position="fill") +
  theme_minimal () +
  theme(legend.position ="bottom") +
  labs(title = "Share of male legislators by session", 
       x= "Session", y = "Share of legislators", fill ="sex")
Ottie
  • 1,000
  • 3
  • 9