Difference between prop.table() & dnorm()

Question

Could someone explain why the following two plots yield different results:

prop.table(table(S)) [where 'S' is the Random variable...representing Roulette wheel outcomes in this case]
dnorm([a list of values over the range of S], mean(S), sd(S))

Here is my code Snippet:

Frequency Plot of Random Variable (S)

plot(prop.table(table(S)), xlab = "Net Profit", ylab = "Probability", type = "h")

base <- seq(min(S),max(S),length = B)
pdf = data.frame(profit = base, probability = dnorm(base,avg,sd))

lines(pdf)

I can't upload pictures of my plot because of inadequate reputation However, the 'line-plot' peak is about half of the 'prop.table(table(S))' plot

Cold you clear my understanding?

prop.table(Table(S)) gives us the probability of a value occurring ( as given by the value's frequency of occurrence)

dnorm(value,mean,std) gives us the probability of a value occurring (as given by the normal distribution )

if both are the probability of the same thing, shouldn't the peaks overlap, as shown in the video

Thanks in advance :D

Update: Here is the exact code I'm using:


set.seed(1)
plays <- 1000
B <- 10000

#Monte Carlo Sim for Roulette Wheel

S <- replicate(B,{  # S because Random Variable
  sum(sample(c(-1,1), plays, replace = TRUE, prob = c(18/38,20/38)))
  # -1 -> Casino loose bet ; 1 -> Casino win bet
})

avg = mean(S); sd = sd(S)

# Frequency Plot of Random Variable of R. Wheel outcome
plot(prop.table(table(S)), xlab = "Net Profit", ylab = "Probability", type = "h")

base <- seq(min(S),max(S),length = B)
pdf = data.frame(profit = base, probability = dnorm(base,avg,sd))

lines(pdf)

It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. But I think you might be getting expected/observed probabilities mixed up as well as confusing discrete vs continuous random variables which are both really more statistics questions than programming questions. Questions about statistics belong on [stats.se] instead of Stack Overflow. — MrFlick, Apr 13 '20 at 19:05
Densities for continuous distributions (as calculated by dnorm) aren't proportions or probabilities, so you wouldn't expect them to match up with the proprtions of a discrete distribution. — George Savva, Apr 13 '20 at 19:24
@GeorgeSavva, are you saying dnorm gives me the 'probablity-density' for the value? and this is not the real probability? So, in order to get the probability, what should I do? ```pnorm(x+Δx, mean, sd) - pnorm (x, mean, sd)``` — captain_nemo, Apr 13 '20 at 19:33
Density is not a probability. You have to multiply the density by some value. To make it equal to the probability from `prop.table`, find the bin width of S (your sample). In your case, it is 2. So prob = density * 2. — Edward, Apr 13 '20 at 23:46

score 1 · Accepted Answer · answered Apr 14 '20 at 00:02

A probability density is not a probability. It is a probability per unit of something.

Your sample, S, is only ever going to be divisible by 2, since the outcome is either -1 or 1. When you tabulate, you'll notice this. Then prop.table returns the proportion or probabilities of those values (-2, 0, 2, 4, 6, ...). These are discrete values, not continuous.

dnorm returns the density for a given normal ditribution. So if you want to use dnorm to emulate a probability, you need to multiply it by the per unit. In this case, 2 - the width of the histogram bars.

pdf2 = data.frame(profit = base, probability = dnorm(base,avg,sd) * 2)
lines(pdf2, col="blue", lwd=2)

Thank you for this lucid explanation. Especially the one regarding the bin size :D — captain_nemo, Apr 14 '20 at 18:41

Difference between prop.table() & dnorm()

Frequency Plot of Random Variable (S)

1 Answers1