Overlaying histogram and density estimate

Question

In psychometrics, you might have discrete measurements (e.g. on a scale from 1 to 4), but still assume that those measurements represent an underlying continuous process.

I am trying to produce a plot that depicts these discrete measurements and the underlying distribution.

So far I haven't managed to get what I produced. The best I have come up so far is trying to overlay the density plot on a histogram. But there is a mismatch between the scale of the histogram densities and the scale of the density line:

library(ggplot2)

var1  <- c(rep(1, times = 50),
           rep(2, times = 60),
           rep(3, times = 40),
           rep(4, times = 30))

df <- as.data.frame(var1)

ggplot(df, aes(x=var1)) +
  geom_line(aes(y=..density..),stat = 'density') +
  geom_histogram(aes(y=..density..))
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

My understanding is that I am looking at two different density functions:

on the histogram, I oversample some parts of the distribution and undersample others, which produces "distorted" density estimates compared to
the density line, where the density estimate is computed the whole (continuous) range of values on my interval.

... is there a way to get both functions to the same scale (or if not, maybe someone has a clue why it makes no statistical sense to try and do that).

Thanks!

the trick is to use `stat(density)` as y in `geom_histogram`, as in: https://stackoverflow.com/a/54700242/7941188 — tjebo, Dec 02 '20 at 10:25
Thanks for the amazingly quick responses! I have another piece of work at hand right now, I will come back to this later. — raphael_ldl, Dec 02 '20 at 12:08
I am still not getting the result I am looking for with this. I am trying to produce a plot where the "tip" of the histogram is at the same height as the "tip" of the density curve. Maybe you can share a reprex to see if we are really talking about the same thing? — raphael_ldl, Dec 02 '20 at 13:12
to get the tip of the density curve at a desired point may not be straight forward - it is very much a matter of the smoothing parameter. (I think it is something like `bw = ...`) within the geom_density call — tjebo, Dec 03 '20 at 15:11

Overlaying histogram and density estimate

0 Answers0