how to overlap histogram and density plot with Numbers on Y-axis instead of density

Question

I have the histogram plot created in ggplot2 and I'd like to overlap it with density line for the same data. Importantly, I don't want to turn histogram into density values, but want to keep N (numbers) on y axis. Is there any way to overlap the histogram and density plot without transforming the histogram, but rather to scale up the density curve ?

The histogram for this data:

The initial density plot for the same data:

The desired overlay but with density on Y-axis instead of counts:

[See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that folks can help with. Right now we've got neither code nor data, so it's hard to do more than guess what you're doing exactly — camille, Oct 08 '19 at 02:29

Aaron left Stack Overflow · Answer 1 · 2019-10-08T13:03:22.473

7

You'll want to use the ..count.. parameter created by stat_density, and then scale it by the bin width.

library(ggplot2)
set.seed(15)
df <- data.frame(x=rnorm(500, sd=10))
ggplot(df, aes(x=x)) + 
  geom_histogram(colour="black", fill="white", binwidth = 5 ) +
  geom_density(aes(y=..count..*5), alpha=.2, fill="#FF6666")

edited Oct 08 '19 at 13:03

answered Oct 08 '19 at 02:14

Aaron left Stack Overflow

36,704
7
77
142

Thanks a lot! Could I ask why we normalize for the binwidth ? And also where exactly the construction "..count.." comes from ? It's an output of ggplot+geom_hist function, right ? And how do you use these ".." double dots symbol ? Sorry for these basic questions and thanks a lot again for your answer - that's exactly what I was looking for ! – Eugene Oct 08 '19 at 14:58
1

`..count..` is created by `stat_density` which is the default for `geom_density`; you use it like any other variable. I don't know exactly what happens behind the scenes, but see the documentation for `stat_density` to see what it creates. And you scale by binwidth because `..count`` is count per unit on the x-axis, but the histogram is count per binwidth. – Aaron left Stack Overflow Oct 08 '19 at 21:36
If this answers your question, click the check mark to let others know. Thanks! – Aaron left Stack Overflow Oct 08 '19 at 21:38

score 3 · Accepted Answer · answered Oct 08 '19 at 02:01

Yes, but you have to choose the right scale factor. Since you do not provide any data, I will illustrate with the built-in iris data.

H = hist(iris$Sepal.Width, main="")

Since the heights are the frequency counts, the sum of the heights should equal the number of points which is nrow(iris). The area under the curve (boxes) is the sum of the heights times the width of the boxes, so

  Area = nrow(iris) * (H$breaks[2] - H$breaks[1])

In this case, it is 150 * 0.2 = 30, but better to keep it as a formula.

Now the area under the standard density curve is one, so the scale factor that we want to use is nrow(iris) * (H$breaks[2] - H$breaks[1]) to make the areas the same. Where do you apply the scale factor?

DENS = density(iris$Sepal.Width)
str(DENS)
List of 7
 $ x        : num [1:512] 1.63 1.64 1.64 1.65 1.65 ...
 $ y        : num [1:512] 0.000244 0.000283 0.000329 0.000379 0.000436 ...
 $ bw       : num 0.123
 $ n        : int 150
 $ call     : language density.default(x = iris$Sepal.Width)
 $ data.name: chr "iris$Sepal.Width"
 $ has.na   : logi FALSE

We want to scale the y values for the density plot, so we use:

DENS$y = DENS$y * nrow(iris) * (H$breaks[2] - H$breaks[1])

and add the line to the histogram

lines(DENS)

You can make this a bit nicer by adjusting the bandwidth for the density calculation

H = hist(iris$Sepal.Width, main="")
DENS = density(iris$Sepal.Width, adjust=0.7)
DENS$y = DENS$y * nrow(iris) * (H$breaks[2] - H$breaks[1])
lines(DENS)

Thank you so much! Especially for explaining the structure of density function result! — Eugene, Oct 08 '19 at 14:49

how to overlap histogram and density plot with Numbers on Y-axis instead of density

2 Answers2