In psychometrics, you might have discrete measurements (e.g. on a scale from 1 to 4), but still assume that those measurements represent an underlying continuous process.
I am trying to produce a plot that depicts these discrete measurements and the underlying distribution.
So far I haven't managed to get what I produced. The best I have come up so far is trying to overlay the density plot on a histogram. But there is a mismatch between the scale of the histogram densities and the scale of the density line:
library(ggplot2)
var1 <- c(rep(1, times = 50),
rep(2, times = 60),
rep(3, times = 40),
rep(4, times = 30))
df <- as.data.frame(var1)
ggplot(df, aes(x=var1)) +
geom_line(aes(y=..density..),stat = 'density') +
geom_histogram(aes(y=..density..))
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
My understanding is that I am looking at two different density functions:
- on the histogram, I oversample some parts of the distribution and undersample others, which produces "distorted" density estimates compared to
- the density line, where the density estimate is computed the whole (continuous) range of values on my interval.
... is there a way to get both functions to the same scale (or if not, maybe someone has a clue why it makes no statistical sense to try and do that).
Thanks!