2

I have the histogram plot created in ggplot2 and I'd like to overlap it with density line for the same data. Importantly, I don't want to turn histogram into density values, but want to keep N (numbers) on y axis. Is there any way to overlap the histogram and density plot without transforming the histogram, but rather to scale up the density curve ?

The histogram for this data:

img1

The initial density plot for the same data:

img2

The desired overlay but with density on Y-axis instead of counts:

img3

camille
  • 16,432
  • 18
  • 38
  • 60
Eugene
  • 85
  • 8
  • [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that folks can help with. Right now we've got neither code nor data, so it's hard to do more than guess what you're doing exactly – camille Oct 08 '19 at 02:29

2 Answers2

7

You'll want to use the ..count.. parameter created by stat_density, and then scale it by the bin width.

library(ggplot2)
set.seed(15)
df <- data.frame(x=rnorm(500, sd=10))
ggplot(df, aes(x=x)) + 
  geom_histogram(colour="black", fill="white", binwidth = 5 ) +
  geom_density(aes(y=..count..*5), alpha=.2, fill="#FF6666") 

enter image description here

Aaron left Stack Overflow
  • 36,704
  • 7
  • 77
  • 142
  • Thanks a lot! Could I ask why we normalize for the binwidth ? And also where exactly the construction "..count.." comes from ? It's an output of ggplot+geom_hist function, right ? And how do you use these ".." double dots symbol ? Sorry for these basic questions and thanks a lot again for your answer - that's exactly what I was looking for ! – Eugene Oct 08 '19 at 14:58
  • 1
    `..count..` is created by `stat_density` which is the default for `geom_density`; you use it like any other variable. I don't know exactly what happens behind the scenes, but see the documentation for `stat_density` to see what it creates. And you scale by binwidth because `..count`` is count per unit on the x-axis, but the histogram is count per binwidth. – Aaron left Stack Overflow Oct 08 '19 at 21:36
  • If this answers your question, click the check mark to let others know. Thanks! – Aaron left Stack Overflow Oct 08 '19 at 21:38
3

Yes, but you have to choose the right scale factor. Since you do not provide any data, I will illustrate with the built-in iris data.

H = hist(iris$Sepal.Width, main="")

Base histogram

Since the heights are the frequency counts, the sum of the heights should equal the number of points which is nrow(iris). The area under the curve (boxes) is the sum of the heights times the width of the boxes, so

  Area = nrow(iris) * (H$breaks[2] - H$breaks[1])

In this case, it is 150 * 0.2 = 30, but better to keep it as a formula.

Now the area under the standard density curve is one, so the scale factor that we want to use is nrow(iris) * (H$breaks[2] - H$breaks[1]) to make the areas the same. Where do you apply the scale factor?

DENS = density(iris$Sepal.Width)
str(DENS)
List of 7
 $ x        : num [1:512] 1.63 1.64 1.64 1.65 1.65 ...
 $ y        : num [1:512] 0.000244 0.000283 0.000329 0.000379 0.000436 ...
 $ bw       : num 0.123
 $ n        : int 150
 $ call     : language density.default(x = iris$Sepal.Width)
 $ data.name: chr "iris$Sepal.Width"
 $ has.na   : logi FALSE

We want to scale the y values for the density plot, so we use:

DENS$y = DENS$y * nrow(iris) * (H$breaks[2] - H$breaks[1])

and add the line to the histogram

lines(DENS)

Histogram with density curve

You can make this a bit nicer by adjusting the bandwidth for the density calculation

H = hist(iris$Sepal.Width, main="")
DENS = density(iris$Sepal.Width, adjust=0.7)
DENS$y = DENS$y * nrow(iris) * (H$breaks[2] - H$breaks[1])
lines(DENS)

Histogram with adjusted density curve

G5W
  • 36,531
  • 10
  • 47
  • 80