The problem is caused by R's numerical precision limits. As commenters noted, the n choose k values I am computing above are really, really big (choose(80,40) = 1.075072e+23
).
We can use logs to try and keep the problem within R's computational limits. This is the implementation of Ramanujan's approach. Unfortunately, the errors in the approximation compound, and the accuracy decays even faster. The probability function requires adding and subtracting a sequence of very large numbers to get a final value between 0 and 1, and is not tolerant of any imprecision.
0) Rewrite the probability function to be broken into steps
probability <- function(s, m, n) {
# Probability of getting less than s
i <- 0:((s-1-n) / m)
c1 <- choose(n, i)
c2 <- choose(s - 1 - i * m, n)
seq <- (-1)^i * (c1 * c2)
m^(-n) * sum(seq)
}
1) implement an approximation of log(x!)
# using the 'ramanujan' method
ramanujan <- function(n){
n * log(n) - n + log(n * (1 + 4*n * (1 + 2*n))) / 6 + log(pi) / 2
}
# confirm Ramanujan works correctly
n <- 1:200
diff <- log(factorial(n)) - ramanujan(n)
plot(n, diff) # r returns inf for factorial(171), but up to there the numbers match
2) rewrite choose
function using log approximation.
#' This function returns log(choose(n,k))
log_nck <- Vectorize(function(n, k) {
if(n <= k | n < 1 | k < 1) return(log(choose(n,k))) # logs don't like 0 or neg numbers
return((ramanujan(n) - ramanujan(k) - ramanujan(n-k)))
})
# Check that choose function works
n <- seq(10, 100, 10)
k <- seq(5, 50, 5)
c_real <- log(choose(n, k))
c_approx <- log_nck(n, k)
# If we print them, they appear to match
print(c_real)
print(c_approx)
# and the difference shows pretty small errors.
print(c_real - c_approx)
3) rewrite the probability function using log choose.
new_probability <- function(s, m, n) {
# Probability of getting less than s
i <- 0:((s-1-n) / m)
c1 <- log_nck(n, i)
c2 <- log_nck(s - 1 - i * m, n)
seq <- (-1)^i * exp(c1 + c2)
return(m^(-n) * sum(seq))
}
Final Testing
n <- 1:90 # number of dice
m <- 6 # number of sides
s <- floor(mean(1:m)*n) # sum of faces
p <- mapply(probability, s = s, m = m, n = n)
newp <- mapply(new_probability, s = s, m = m, n = n)
plot(n, p, main = "Original in black, approximation in red")
points(n, newp, col = "red")
