0

I want to create a histogram that presents the proportion of females (y axis) for each age group (x-axis). I want to have two bars for each age group that represent females with disease "N" and without disease "N".

Data:

example data

Other posts related to this topic that I have reviewed:

r percentage by bin in histogram ggplot

Barplots with multiple factor groupings and mean of variable across those factors

Code I have tried:

ggplot(N_group, aes(x=Age_2, fill=Sex))+
  geom_bar(aes( y=..count../tapply(..count.., ..x.. ,sum)[..x..]), position="dodge" ) +
  geom_text(aes( y=..count../tapply(..count.., ..x.. ,sum)[..x..], 
label=scales::percent(..count../tapply(..count.., ..x.. ,sum)[..x..]) ),
            stat="count", position=position_dodge(0.9), vjust=-0.5)

This compares males and females that have disease "N".

sar
  • 182
  • 6
  • 26
  • 1
    have you seen this [post](https://stackoverflow.com/help/minimal-reproducible-example). Reason, the asked Q is not reproducible. Please do not post screenshots of the data. – mnm Apr 13 '20 at 00:08

1 Answers1

2

Here, a possible solution is to calculate proportions out of ggplot2.

Here, an example using the following fake dataframe:

df <- data.frame(ID = 1:40,
                 N = sample(c(0,1),40,replace = TRUE),
                 age_group = sample(1:4,40, replace = TRUE),
                 sex = sample(c("M","F"),40,replace = TRUE))

Using dplyr package, you can calculate proportion of each N group for each age_group for each gender:

library(dplyr)

df %>% 
  #group_by(sex, age_group, N, .drop = FALSE) %>% 
  count(sex, age_group, N) %>% 
  filter(sex =="F") %>%
  group_by(age_group) %>%
  mutate(Percent = n / sum(n))


# A tibble: 8 x 5
# Groups:   age_group [4]
  sex   age_group     N     n Percent
  <fct>     <int> <dbl> <int>   <dbl>
1 F             1     0     1   0.167
2 F             1     1     5   0.833
3 F             2     0     2   0.4  
4 F             2     1     3   0.6  
5 F             3     0     2   0.4  
6 F             3     1     3   0.6  
7 F             4     0     1   0.5  
8 F             4     1     1   0.5 

Passing this pipe sequence to ggplot2 give you the following graph:

library(dplyr)
library(ggplot2)

df %>% 
  count(sex, age_group, N) %>% 
  filter(sex =="F") %>%
  group_by(age_group) %>%
  mutate(Percent = n / sum(n)) %>%
  ggplot(aes(x = age_group, y = Percent, fill = factor(N)))+
  geom_col(position = position_dodge())+
  scale_y_continuous(labels = scales::percent)

enter image description here

Does it answer your question ?

dc37
  • 15,840
  • 4
  • 15
  • 32
  • Thanks @dc37 I get this error with your code (both with your data and mine) Error in count(., sex, age_group, N) : unused argument (N) – sar Apr 13 '20 at 22:04
  • 1
    Weird, because with my example, I don't have this issue. Everything runs smoothly. Can you show the output of `str(df)` ? – dc37 Apr 14 '20 at 01:26