0

I would like to create a histogram where the y-axis shows the percentage per facet in ggplot2. I have seen several similar questions but some answers seem outdated or they show the percentage of all observations rather than per facet.

I tried this:

library(ggplot2)
library(scales)

ggplot(mtcars, aes(mpg))+
    facet_grid(cyl ~ am)+
    stat_count(aes(y=..prop..)) +
    theme_bw()+
    scale_y_continuous(labels = percent_format())

Which seems to work, except that the binwidth is not fixed. Facets with few observations have large bars.

How could I fix the binwidth?

EDIT: Solution adapted from ACNB I overlooked that before and I just saw that Andrey Kolyadin was quicker to provide a more concise solution.

binwidth <- 1
mtcars.stats <- mtcars %>%
    group_by(cyl, am) %>%
    mutate(bin = cut(mpg, breaks=seq(0,35, binwidth), 
                     labels = seq(0 + binwidth, 35, binwidth)-(binwidth/2)),
           n = n()) %>%
    group_by(cyl, am, bin) %>%
    summarise(p = n()/n[1]) %>%
    ungroup() %>%
    mutate(bin = as.numeric(as.character(bin)))

ggplot(mtcars.stats, aes(x = bin, y= p)) + 
    geom_col() + 
    facet_grid(cyl~am)+
    theme_bw()+
    scale_y_continuous(labels = percent_format())
bee guy
  • 429
  • 7
  • 20
  • 1
    I did something similar a few weeks ago and found it easier to use `gridExtra` instead of `facet_grid` to set stuff like binwidth individually. – LAP Nov 15 '17 at 09:42
  • 2
    `stat_count(aes(y = ..prop..), width = 1)` – pogibas Nov 15 '17 at 09:42
  • Do I have to worry about this warning `position_stack requires non-overlapping x intervals` ? – bee guy Nov 15 '17 at 10:15

2 Answers2

3

As alway I advice not to rely on statistics layer of ggplot2 and calculate necessary statistics before plotting:

library('zoo')
library('tidyverse')

# Selecting breaks
breaks <- seq.int(min(mtcars$mpg), max(mtcars$mpg), length.out = 19)

# Calculating densities
mt_hist <- mtcars %>% 
  group_by(cyl, am) %>% 
  summarise(x = list(rollmean(breaks, 2)),
            count = list(hist(mpg, breaks = breaks, plot = FALSE)$counts)) %>% 
  unnest() %>% 
  group_by(cyl, am) %>% 
  mutate(count = count/sum(count))

And plot itself:

ggplot(mt_hist)+
  aes(x = x,
      y = count)+
  geom_col()+
  facet_grid(cyl ~ am)+
  theme_bw()+
  scale_y_continuous(labels = percent_format())

enter image description here

Andrey Kolyadin
  • 1,301
  • 10
  • 14
  • Good call, I was going to reply but you've done a much better solution - great advice too, to do a bit of data munging beforehand rather than the ggplot stats layer (which is what I was attempting!) – sorearm Nov 15 '17 at 11:30
1

have you tried adding the geom_histogram and stat argument, something like ...

p <- ggplot(mtcars, aes(mpg))
p <- p + geom_histogram(stat = 'bin')
p <- p + facet_grid(cyl ~ am)
p <- p + stat_count(aes(y=..prop..))
p <- p +   theme_bw()
p <- p +   scale_y_continuous(labels = percent_format())
p
sorearm
  • 409
  • 2
  • 10