0

I would like to make stacked (facet_grid) size histograms in ggplot2 by Year. The years have different sample sizes. I have not been able to get the ..density.. to produce correct proportions for each histogram bin. So, I've been using ..count../(sample size number). From my reading of the stat tranformations ..count.., you cannot perform an operation with an object (e.g. nrow(data)). How can I get these stacked histograms with different sample sizes? The format in the code below would produce a figure that matches other figures for a report, which is why I would like to stick with ggplot2, but maybe there are other packages. Here is an example:

d1 <- as.data.frame(round(rnorm(121, 86, 28), 0))
colnames(d1) <- "Length"
d1$Year <- "2015"

d2 <- as.data.frame(round(rnorm(86, 70, 32), 0))
colnames(d2) <- "Length"
d2$Year <- "2016"

D <- rbind(d1, d2)

ggplot(D, aes(x = Length)) +
  geom_histogram(aes(y = ..count../nrow(D)), 
                 breaks=seq(0, 160, by = 3), 
                 col="black", 
                 fill="grey48", 
                 alpha = .8)+
  labs(title = "Size by Year", x = "Length", y = "frequency") +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 10)) +
  theme_bw() + 
  theme(text = element_text(size=16), 
        axis.text.y = element_text(size=12)) +
  geom_vline(aes(xintercept = 95.25), 
             colour = "red", size = 1.3)+
  facet_grid(Year ~ .)

This part ..count../nrow(D) won't work and needs the sample size for each year when I facet them facet_grid(Year ~ .)

Z.Lin
  • 28,055
  • 6
  • 54
  • 94
KVininska
  • 107
  • 1
  • 6
  • Does the answer [here](https://stackoverflow.com/questions/16339204/normalizing-faceted-histograms-separately-in-ggplot2) work for you? – Z.Lin Sep 19 '18 at 23:59

1 Answers1

2

Is this what you are looking for? You didn't specify what went wrong when you used ..density.., but it seems like you just need to scale by the binwidth. ..density.. scales so that the total bar area is 1, meaning that each bar has height ..count.. / (n * binwidth). You just want the height to be ..count.. / n, which is ..density.. * binwidth. So set the binwidth manually (you should do this anyway) and multiply by it.

set.seed(1234)
d1 <- as.data.frame(round(rnorm(121, 86, 28), 0))
colnames(d1) <- "Length"
d1$Year <- "2015"

d2 <- as.data.frame(round(rnorm(86, 70, 32), 0))
colnames(d2) <- "Length"
d2$Year <- "2016"

D <- rbind(d1, d2)

library(ggplot2)
ggplot(D, aes(x = Length)) +
  geom_histogram(aes(y = ..density.. * 5), binwidth = 5) +
  geom_vline(aes(xintercept = 95.25), colour = "red", size = 1.3) +
  facet_grid(Year ~ .) +
  labs(title = "Size by Year", x = "Length", y = "frequency") +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 10)) +
  theme_bw() +
  theme(
    text = element_text(size = 16),
    axis.text.y = element_text(size = 12)
  )

Created on 2018-09-19 by the reprex package (v0.2.0).

Calum You
  • 14,687
  • 4
  • 23
  • 42
  • How do you choose binwidth? It changes the scale. I provided test data, but when I run this with my real data, the bin frequencies do not add up to 1. I also get an error: "Warning: Ignoring unknown aesthetics: binwidth, bins, 'stat_bin()' using 'bins = 30'. Pick better value with binwidth. I tried binwidths that varied by orders of magnitude and none of them made sense. – KVininska Sep 20 '18 at 15:48
  • 1
    binwidth is not an aesthetic, it's a parameter of geom_histogram. Check your parentheses – Calum You Sep 20 '18 at 21:56