2

I need to replicate a certain format of a histogram/barchart. I already did some good modification with ggplot in order to group the categorial x-variable and specifiy the colors with HEX.

Here is what I try to plot/replicate:

enter image description here

Here is a MWE for my data structure:

sex <- sample(0:1, 100, replace=TRUE)
group <- sample(2:5, 100, replace=TRUE)
data <- data.frame(sex, group)

library(ggplot2)
ggplot(data, aes(x = group, group=sex, fill=factor(sex) )) +
  geom_histogram(position="dodge", binwidth=0.45) +                      
  theme(axis.title.x=element_blank(), axis.title.y=element_blank()) +    
  guides(fill=guide_legend(title="sex")) +                        
  scale_y_continuous(labels = scales::percent_format())   +              
  scale_fill_manual(values=c("#b6181f", "#f6b8bb")) 

I get:

enter image description here

Small things I can't handle are:

  • replace the factor labels on the x-axis, there might be a problem with my histogram-approach, but I also found no practical way with a bar-chart
  • round the percentage-digits, no decimals for percentages

But most important is, that I don't know how to add a single percentage-value for one group, one sex to the top of each bar..

I am looking forward for some advice :)

Community
  • 1
  • 1
Marco
  • 2,368
  • 6
  • 22
  • 48
  • 1
    Ok, sorry for this. The initial problem is solved :) – Marco Jan 05 '18 at 14:46
  • Alright I have rollbacked your edits, but you can still see the other versions by clicking on the [revisions](https://stackoverflow.com/posts/48113765/revisions). – Axeman Jan 05 '18 at 14:48

1 Answers1

3

First of all I would treat your x-axis data as factors and plot it as bars. Getting percentage value text to the bar top look this question: Show % instead of counts in charts of categorical variables. Futhermore the y-axis percent values aren't a question of rounding, they actually are no percentage values. y = ..prop.. solves that.

Are you looking for that (I summed everything up)?

sex <- sample(0:1, 100, replace=TRUE)
group <- sample(2:5, 100, replace=TRUE)
data <- data.frame(sex, group)

labs <- c("Score < 7", "Score\n7 bis < 12", "Score\n12 bis < 15",
          "Score\n15 bis < 20","Score >= 20")

ggplot(data, aes(x = as.factor(group), y = ..prop.., group = sex, fill = factor(sex) )) +
  geom_bar(position = "dodge") +
  geom_text(aes(label = scales::percent(..prop..)), 
            position = position_dodge(width = 0.9), stat = "count", vjust = 2) +
  labs(x = NULL, y = NULL) +
  guides(fill = guide_legend(title = "sex")) +                        
  scale_y_continuous(labels = scales::percent_format())   +              
  scale_fill_manual(values=c("#b6181f", "#f6b8bb")) +
  scale_x_discrete(labels = labs)

enter image description here

Axeman
  • 32,068
  • 8
  • 81
  • 94
giovannotti
  • 138
  • 6
  • 1
    Good answer! Here's some possible adjustments: you can use `..prop..` instead of `..count.. / sum(..count..)`. Using `vjust = 2` would put the labels in the bars. `position_dodge(width = 0.9)` puts the text in the middle of the bars. (And maybe use some line breaks to avoid code running off screen.) – Axeman Jan 05 '18 at 13:28
  • Wow great, I just not realized that the exact replication lacks the problem that percentages are reported twice and don't add information. Is it possible to either add the absolute number of obs to y-axis or the pillars? Regards – Marco Jan 05 '18 at 13:35
  • 1
    @Marco Just change the `label` in `geom_text` to `label = ..count..` and you get absolute count values – giovannotti Jan 05 '18 at 14:32