10

For a class assignment, I need to create a function that calculates n Choose k. I did just that, and it works fine with small numbers (e.g. 6 choose 2), but I'm supposed to get it work with 200 choose 50, where it naturally doesn't. The answer is too large and R outputs NaN or Inf, saying:

> q5(200, 50)
[1] "NaN"
Warning message:
In factorial(n) : value out of range in 'gammafn'

I tried using logs and exponents, but it doesn't cut it.

q5 <- function (n, k) {
  answer <- log(exp( factorial(n) / ( (factorial(k)) * (factorial(n - k)) )))
  paste0(answer)
}
Mike Wise
  • 22,131
  • 8
  • 81
  • 104
Ronny Efronny
  • 1,148
  • 9
  • 28

4 Answers4

8

The answer to the actual question is that R cannot show numbers it cannot represent, and some of the terms in your equation are too big to represent. So it fails. However there are approximations to factorial that can be used - they work with logarithms which get big a lot slower.

The most famous one, Sterling's approximation, was not accurate enough, but the Ramanujan's approximation came to the rescue :)

ramanujan <- function(n){
  n*log(n) - n + log(n*(1 + 4*n*(1+2*n)))/6 + log(pi)/2
}

nchoosek <- function(n,k){
  factorial(n)/(factorial(k)*factorial(n-k))
} 

bignchoosek <- function(n,k){
  exp(ramanujan(n) - ramanujan(k) - ramanujan(n-k))
}

nchoosek(20,5)
# [1] 15504

bignchoosek(20,5)
# [1] 15504.06


bignchoosek(200,50)
# [1] 4.538584e+47
Mike Wise
  • 22,131
  • 8
  • 81
  • 104
3

The packages for large numbers:

Brobdingnag package for "Very large numbers in R":
https://cran.r-project.org/web/packages/Brobdingnag/index.html
Paper: https://www.researchgate.net/publication/251996764_Very_large_numbers_in_R_Introducing_package_Brobdingnag

library(Brobdingnag)
googol <- as.brob(10)^100 # googol:=10^100
googol
# [1] +exp(230.26) # exponential notation is convenient for very large numbers

gmp package for multiple Precision Arithmetic (big integers and rationals, prime number tests, matrix computation):
https://cran.r-project.org/web/packages/gmp/index.html

Erdogan CEVHER
  • 1,788
  • 1
  • 21
  • 40
3

You can try this too:

q5 <- function (n, k) {
  # nchoosek = (n-k+1)(n-k+2)...n / (1.2...k)
  return(prod(sapply(1:k, function(i)(n-k+i)/(i))))
}

q5(200, 50)
#[1] 4.538584e+47

or in log domain

q5 <- function (n, k) {
  # ln (nchoosek) = ln(n-k+1) + ln(n-k+2) + ...+ ln(n) - ln(1) -ln(2) - ...- ln(k)
  return(exp(sum(sapply(1:k, function(i)(log(n-k+i) - log(i))))))
}
q5(200, 50)
#[1] 4.538584e+47
Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63
2

This solution calculates the complete row of the Pascal triangle:

x <- 1
print(x)
for (i in 1:200) { x <- c(0, x) + c(x, 0); print(x) }
x[51]  ### 200 choose 50
## > x[51]
## [1] 4.538584e+47

(as I proposed for How would you program Pascal's triangle in R? )
If you want to speed up the code then do not the print(x) (output is a relative slow operation).

To put the code in a function we can do

nchoosek <- function(n,k) {
    x <- 1
    for (i in 1:n) x <- c(0, x) + c(x, 0)
    x[k+1]  ### n choose k
} 

nchoosek(200, 50)    ### testing the function
## [1] 4.538584e+47

Here is a more refined version of my function:

nchoosek <- function(n, k) {
  if (k==0) return(1)
  if (k+k > n) k <- n-k
  if (k==0) return(1)
  x <- 1
  for (i in 1:k)     x <- c(0, x) + c(x, 0)
  for (i in 1:(n-k)) x <- x + c(0, head(x, -1))
  tail(x, 1)
} 
nchoosek(200, 50)    ### testing the function
## [1] 4.538584e+47
Community
  • 1
  • 1
jogo
  • 12,469
  • 11
  • 37
  • 42
  • It is quite a nice solution. But it seems a bit like magic the way you have presented it and it is also not clear how to use it. Wrapping it in a function like I did with `nchoosek(n,k)` would help a lot and only add a couple of lines. – Mike Wise Nov 10 '16 at 19:58
  • I edited my answer. – jogo Nov 10 '16 at 22:19