Comments above suggested you to manually rescale yfit
, before passing it to chisq.test
. Well actually you can let chisq.test()
do this for you:
chisq.test(x = freq.os, p = yfit, rescale.p = TRUE)
Regarding your Edit
chisq.test(freq.os, yfit)
is incorrect, as it is doing independence test.
chisq.test()
can perform two statistical test:
- goodness of fit test, using argument
x
and p
;
- independence test, using argument
x
and y
.
Please read ?chisq.test
carefully. For goodness of fit test, you must use p
argument in the function, as you initially did. My answer above, using rescale.p = TRUE
will help you get around the error you saw.
How to do Pearson Chi-square test
You said you don't know how the test is done, then read this part.
You should use set.seed()
, so that when others run your code, they get the same random numbers as you got. Here is a reproducible example:
N <- 200 ## number of samples
set.seed(0) ## fix random seed for reproducibility
x <- rpois(N,50) ## generate sample
lambda <- mean(x) ## estimate mean
Now we aim to use Pearson Chi-square test to test goodness of fit. We have
Null Hypothesis: x ~ Poisson(lambda)
O <- table(x) ## contingency table / observed frequency
n <- length(O) ## number of categories
# 31
x0 <- as.integer(names(O)) ## unique sample values
p <- dpois(x0, lambda); p <- p / sum(p) ## theoretical probability under Null Hypothesis
Let's first perform Chi-square test ourselves.
E <- p * N ## expected frequency under Null Hypothesis
R <- (O - E) / sqrt(E) ## standardised residuals
X <- sum(R * R) ## Chi-square statistic
# 36.13962
dof <- n - 1 ## degree of freedom
# 30
p.value <- 1 - pchisq(X, df = dof) ## p-value
# 0.2035416
The p-value is larger than 0.05, hence we can not reject Null Hypothesis. Summary plot:
z <- curve(dchisq(x, df = dof), from = 0, to = 100)
abline(v = X, lty = 3)
polygon(c(X, X, z$x[z$x > X]), c(0, dchisq(X, df = dof), z$y[z$x > X]), col = 2)

Then we use chisq.test()
chisq.test(O, p = p)
#Chi-squared test for given probabilities
#data: O
#X-squared = 36.14, df = 30, p-value = 0.2035
You can see, the result is the same.