Does python fail in handling large numbers accurately?

Question

I have made python code to calculate the following expression:

With C_n beeing the n:th catalan number defined by C_n = (2nCn) / (n+1)

from math import comb
import sys
from decimal import Decimal

def factorial(n):
    fact = 1
    for i in range(1, n+1):
        fact = fact * i
    return fact

sum = 0

n = int(sys.stdin.readline())

for i in range(n+1):
    sum += int((((factorial(2 * i))/(factorial(i) * factorial(i)))/(i+1)) * (((factorial(2 * (n - i)))/(factorial(n - i) * factorial(n - i)))/(n - i + 1)))

print(sum)

The input 59 returns: 1583850964596120056618874138787840 when the expected output is: 1583850964596120042686772779038896

Why is this?

Does this answer your question? [Is floating point math broken?](https://stackoverflow.com/questions/588004/is-floating-point-math-broken) — mkrieger1, Jun 05 '23 at 15:20
The `/` operator is used with `int` values which results in `float` values, so the answer to the question in the title in this case is "yes". — mkrieger1, Jun 05 '23 at 15:20
As an aside, you'll may want to look into making your factorial function recursive and using the `@cache` decorator from the functools module to speed things up a bit for larger n. — nigh_anxiety, Jun 05 '23 at 15:31
On a math programming note: _never_ compute factorials if you ever _remotely_ have that option. Figure out what math you're trying to do, then rewrite compute "the thing you need factorials for" directly whenever possible. _Especially_ if you're dividing factorials by factorials, don't compute each factorial first: they cancel parts of each other out, don't even compute those parts. — Mike 'Pomax' Kamermans, Jun 05 '23 at 15:51
specifically in this case, Catalan number `Cn` is defined as `1/(n+1) * 2n choose n`, and that last term _does not need factorials at all_, we can generate binomial coefficients [without ever needing factorials using a pascal's triangle lookup](https://stackoverflow.com/a/37716142/740553). — Mike 'Pomax' Kamermans, Jun 05 '23 at 16:06
Were you thinking of the `fractions` module (exact rational arithmetic) rather than `decimal` (essentially, floating-point with a base-10 rather than base-2 representation)? — slothrop, Jun 05 '23 at 16:07

score 2 · Answer 1 · answered Jun 05 '23 at 18:45

It's because you use /, which results in a float, which has fixed precision, not enough to represent the number accurately. Simply use // instead, then you stay in int values, which have arbitrary precision.

As Mike points out, your S_n is actually C_n+1, so you can instead much more efficiently just do this:

n += 1
print(comb(2*n, n) // (n+1))

Mike 'Pomax' Kamermans · Answer 2 · 2023-06-05T18:40:05.813

Note that you should never compute factorials if you can help it, and if you're implementing maths, don't compute a recurrence relation if there is a direct formula that you can use. (That's not always possible, but the majority of the time it is)

Catalan numbers have the direct formula Cn = 1/(n+1) * 2n choose n, and while "on paper" you can write 2n choose n using factorials divided by factorials, in computer land you don't want to make that substitution because factorials are famously huge numbers, and if we want a binomial coefficient we don't actually need them: we can near-trivially compute any binomial coefficient with Pascal's triangle, or even more trivially by using a prebuilt binomial function such as the scipy.special.binom function. However, at very large numbers scipy (ironically) becomes imprecise, so let's just write our own:

binomials = [[1]]

def binom(n,k):
  while n >= len(binomials):
      s = len(binomials)
      next = [0] * (s+1) 
      next[0] = 1
      for i in range(1,s):
          next[i] = binomials[s-1][i-1] + binomials[s-1][i]
      next[s] = 1;
      binomials.append(next)
  return binomials[n][k]

There, this is all pure integer math, and python has infinite precision ints, so as long as we have enough memory this binom function will always give us the correct answer.

with that, let's compute things:

def Cn(n):
    return binom(2*n, n) / (n+1)

def calcSum(n):
    sum = 0
    for k in range(0, n+1):
      sum += Cn(k) * Cn(n-k)
    return "{:f}".format(sum)

And let's test calcSum(59):

>>> calcSum(59)
'1583850964596120221000260537810944.000000'
>>>

Now, that's not the right answer of course (that would be 1583850964596120042686772779038896, so we're off by a factor 10^18), but it is "the right answer for fixed precision IEEE floating point numbers to yield" without any sort of factorial problems.

Of course, we could add in Decimal:

from decimal import Decimal

def Cn(n):
    return Decimal(binom(2*n, n)) / Decimal((n+1))

def calcSum(n):
    sum = Decimal(0)
    for k in range(0, n+1):
      sum += Cn(k) * Cn(n-k)
    return sum

This gives us Decimal('1.583850964596120042686772779E+33') which is clearly a bad answer (if we print it without scientific notation it turns into 1583850964596120042686772779000000) and that E+33 tells us that we forgot to set the decimal precision to something that has at least as many digits as the answer needs. So, let's just use 100 digits of precision:

from decimal import Decimal, getcontext
getcontext().prec = 100

And now the answer is Decimal('1583850964596120042686772779038896').

Done: that is the correct answer.

So your code has two problems. First: don't use factorials when you're programming, use the thing you actually need. In this case, binomial coefficients. Second: use the things Python has for getting the precision you need. You imported Decimal but didn't use it, but even if you had, your code was missing the instruction to set the decimal precision to a number high enough to get the right answer.

The most important lesson here is don't blindly implement math formulae because the symbolic maths that we work out by hand have been optimized over the centuries to be easy for "I know a whole bunch of susbtitution rules from years of study" human brains. So unless you're working in a programming language that can do that same symbolic substitution work (e.g. Mathematica), you're going to have to think like a computer. If you have a recurrence relation, find the non-recurrence equivalent function and implement that. And if you ever see a factorial, ask yourself what it's used for, then see if there's a clean and efficient way to compute that instead.

Does python fail in handling large numbers accurately?

2 Answers2