1

In R I'm able to overlap a normal curve to a density histogram: Eventually I can convert the density histogram to a probability one:

a <- rnorm(1:100)
test <-hist(a,  plot=FALSE)
test$counts=(test$counts/sum(test$counts))*100   # Probability
plot(test, ylab="Probability")
curve(dnorm(x, mean=mean(a), sd=sd(a)), add=TRUE)

But I cannot overlap the normal curve anymore since it goes off scale.

enter image description here

Any solution? Maybe a second Y-axis

Glorfindel
  • 21,988
  • 13
  • 81
  • 109

3 Answers3

4

Now the question is clear to me. Indeed a second y-axis seems to be the best choice for this as the two data sets have completely different scales.

In order to do this you could do:

set.seed(2)
a <- rnorm(1:100)
test <-hist(a,  plot=FALSE)
test$counts=(test$counts/sum(test$counts))*100   # Probability
plot(test, ylab="Probability")
#start new graph
par(new=TRUE)
#instead of using curve just use plot and create the data your-self
#this way below is how curve works internally anyway
curve_data <- dnorm(seq(-2, 2, 0.01), mean=mean(a), sd=sd(a))
#plot the line with no axes or labels
plot(seq(-2, 2, 0.01), curve_data, axes=FALSE, xlab='', ylab='', type='l', col='red' )
#add these now with axis
axis(4, at=pretty(range(curve_data)))

Output:

enter image description here

LyzandeR
  • 37,047
  • 12
  • 77
  • 87
  • I think I should only have used one curve instead of two but the solution is the same. – LyzandeR Oct 12 '15 at 14:25
  • Thanks, probably I didn't explain it correctly. I want the Y-axis to be the probability and not the density. Then I need to overlap the normal curve. Should I use another Y-axis perhaps ? –  Oct 12 '15 at 14:46
  • 1
    @g256 I still don't get the problem. It seems like both your graphs have limits between 0 and 1. Why would you need a second y-axis? Even density is a probability. – LyzandeR Oct 12 '15 at 14:49
  • @g256 Also if the problem is that the density varies between 0-1 whereas the curve can go beyond 1 you shouldn't be using `curve(dnorm(x, ...` but something that shows this difference... – LyzandeR Oct 12 '15 at 14:53
  • 1
    Thanks this is the answer to my question. –  Oct 12 '15 at 15:44
2

At first you should save your rnorm data otherwise you get different data each time.

seed = rnorm(100)

Next go ahead with

hist(seed,probability = T)
curve(dnorm(x, mean=mean(na.omit(seed)), sd=sd(na.omit(seed))), add=TRUE)

Now you have the expected result. Histogram with density curve.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Patrick C.
  • 2,221
  • 1
  • 11
  • 15
  • Thanks, probably I didn't explain it correctly. I want the Y-axis to be the probability and not the density. Then I need to overlap the normal curve. Should I use another Y-axis perhaps ? –  Oct 12 '15 at 14:35
  • Manually set y-axis to interval [0,1] with ylim =c(0,1). Than add the probability function with curve(,..., add=T) should fit in the same plot. – Patrick C. Oct 12 '15 at 15:16
0

The y-axis isn't a "probability" as you have labeled it. It is count data. If you convert your histogram to probabilities, you shouldn't have a problem:

x <- rnorm(1000)
hist(x, freq= FALSE, ylab= "Probability")
curve(dnorm(x, mean=mean(x), sd=sd(x)), add=TRUE)

enter image description here

alexwhitworth
  • 4,839
  • 5
  • 32
  • 59