2

Possible Duplicate:
ggplot2: Overlay histogram with density curve

sorry for what is probably a simple question, but I have a bit of a problem.

I have created a histogram that is based on a binomial distribution with mean=0.65 and sd=0.015 with 10000 samples. The histogram itself looks fine. However, I need to overlay a normal distribution on top of this (with the same mean and standard deviation). Currently, I have the following:

qplot(x, data=prob, geom="histogram", binwidth=.05) + stat_function(geom="line", fun=dnorm, arg=list(mean=0.65, sd=0.015))

A distribution shows up, but it is TINY. This is likely because the mean's count goes up to almost 2,000, while the normal distribution is much smaller. Simply put, it is not fitted with the data the way that R automatically would do. Is there a way to specify the line of the normal distribution to fit the histogram, or is there some way to manipulate the histogram to fit the normal distribution?

Thanks in advance.

Community
  • 1
  • 1
user1044116
  • 21
  • 1
  • 2
  • Duplicate of http://stackoverflow.com/questions/7182556/how-to-add-gaussian-curve-to-histogram-created-with-qplot ; http://stackoverflow.com/questions/5688082/ggplot2-overlay-histogram-with-density-curve ? – Ben Bolker Nov 13 '11 at 14:20

2 Answers2

2

"The distribution is tiny" because you are plotting a density function over counts. You should use the same metric in both plot, eg.:

I try to generate some data for your example:

x <- rbinom(10000, 10, 0.15)
prob <- data.frame(x=x/(mean(x)/0.65))

And plot both as density functions:

library(ggplot2)
ggplot(prob, aes(x=x)) + geom_histogram(aes(y = ..density..), binwidth=.05) + stat_function(geom="line", fun=dnorm, arg=list(mean=0.65, sd=0.015))

enter image description here

daroczig
  • 28,004
  • 7
  • 90
  • 124
  • Worked perfectly. Finding out how to change the count histogram into a density function was the magic step I couldn't figure out. Thanks! – user1044116 Nov 13 '11 at 20:22
2

@daroczig's answer is correct about needing to be consistent in plotting densities rather than counts, but: I'm having trouble seeing how you managed to get a binomial sample with those properties. In particular, the mean of the binomial is n*p, the variance is n*p*(1-p), the standard deviation is sqrt(n*p*(1-p)), so ..

b.m <- 0.65
b.sd <- 0.015

Calculate variance:

b.v <- b.sd^2  ## n*p*(1-p)

Calculate p:

## (1-p) = b.v/(n*p) = b.v/b.m
## p = 1-b.v/b.m
b.p <- 1-b.v/b.m

Calculate n:

## n = n*p/p = b.m/b.p
b.n <- b.m/b.p

This gives n=0.6502251, p=0.9996538 -- so I don't see how you can get this binomial distribution without n<1, unless I messed up the algebra ...

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453