3

I have the following script, which makes my normal curve too small:

ggplot(exercise2d_df, aes(x=residuals_list)) + 
    geom_histogram(alpha=0.2, position="identity") + 
    stat_function(fun = dnorm, args = c(mean=mean(residuals_list), sd=sd(residuals_list)), size = 1, color = "red")

My data is:

residuals_list = c(0.183377698905335, 7.18337769890574, 1.18337769890566, 4.18337769890565, 5.18337769890565, 0.183377698905655, 3.18337769890566,-0.816622301094345, -2.81662230109434, 3.18337769890566, 8.18337769890566, 2.18337769890566, 4.18337769890565, 0.183377698905655, 5.18337769890565, -10.0541259982254, -9.05412599822537, -8.05412599822537, -5.05412599822537, -4.05412599822537, -3.05412599822537, -10.0541259982254, -6.05412599822537, -8.05412599822537, -7.05412599822537, -6.05412599822537, -7.05412599822537, -7.05412599822537, -5.05412599822537, -4.05412599822537, -3.05412599822537, -11.0541259982254, -9.05412599822537, -3.05412599822537, -1.05412599822537, -7.2916296953564, -8.2916296953564, -2.2916296953564, 0.708370304643597, -5.2916296953564, -3.2916296953564, -6.2916296953564, -2.2916296953564, 1.7083703046436, -5.2916296953564, -9.2916296953564, -5.2916296953564, -4.2916296953564, -4.2916296953564, -0.291629695356403, 1.18337769890566, -4.81662230109435, 0.183377698905655, 0.183377698905655, 0.183377698905655, 5.18337769890565, -0.816622301094345, -4.81662230109435, -3.81662230109434, -1.81662230109434, -0.816622301094345, 2.18337769890566, 3.18337769890566, 6.18337769890565, 8.18337769890566, 2.94587400177463, -3.05412599822537, 3.94587400177463, 4.94587400177463, 6.94587400177463, -0.0541259982253741, -0.0541259982253741, -0.0541259982253741, 0.945874001774626, 0.945874001774626, 0.945874001774626, 0.945874001774626, 3.94587400177463, 2.94587400177463, 0.945874001774626, 1.94587400177463, -3.05412599822537, 5.7083703046436, 4.7083703046436, 1.7083703046436, 11.7083703046436, 6.7083703046436, 7.7083703046436, 2.7083703046436, 3.7083703046436, 9.7083703046436, 8.7083703046436, 6.7083703046436, 6.7083703046436, -0.291629695356403, 5.7083703046436, 4.7083703046436, -1.2916296953564, 9.7083703046436, 8.7083703046436, 1.7083703046436, 2.7083703046436, 3.7083703046436)

This code creates a graph like the following:

resulting graph

How do I stretch the normal curve so that it fits the histogram?

(Notice that this is not a question about how to superimpose a normal curve to a histogram in ggplot, even though that is what I am ultimately after, so this is not a duplicate.)

The Unfun Cat
  • 29,987
  • 31
  • 114
  • 156
  • The easiest option was fitting my histogram to my normal curve. All I had to do was add `aes(y= ..density..)` as an option to geom_histogram, like so: `geom_histogram(alpha=0.2, position="identity", aes(y= ..density..))` – The Unfun Cat Feb 19 '14 at 10:40
  • 1
    Still interested in knowing whether this is possible though- keeping the counts on the axes would be nice, but is lost with the method outlined in the comment above. – The Unfun Cat Feb 19 '14 at 12:51
  • Please provide the data you are using to make your code [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – tonytonov Feb 19 '14 at 13:24

1 Answers1

2

The current area under the normal curve is 1, the area of the histogram is the width of the bars times the number of points. So if you multiply the height of the normal curve by this value then it will have the same area. The following works (using the default binwidth calculation, it may be better/more direct to specify a binwidth):

tmpfun <- function(x,mean,sd) {
    diff(range(residuals_list))/30*length(residuals_list)*dnorm(x,mean,sd)
}


ggplot(exercise2d_df, aes(x=residuals_list)) + 
    geom_histogram(alpha=0.2, position="identity") + 
    stat_function(fun = tmpfun, args = c(mean=mean(residuals_list), 
        sd=sd(residuals_list)), size = 1, color = "red")
Greg Snow
  • 48,497
  • 6
  • 83
  • 110