7

Background:

Given n balls such that:

'a' balls are of colour GREEN
'b' balls are of colour BLUE
'c' balls are of colour RED
...

(of course a + b + c + ... = n)

The number of permutations in which these balls can be arranged is given by:

perm = n! / (a! b! c! ..)

Question 1: How can I 'elegantly' calculate perm so as to avoid an integer overflow as as long as possible, and be sure that when I am done calculating, I either have the correct value of perm, or I know that the final result will overflow?

Basically, I want to avoid using something like GNU GMP.

Optionally, Question 2: Is this a really bad idea, and should I just go ahead and use GMP?

ArjunShankar
  • 23,020
  • 5
  • 61
  • 83
  • Why do you want to avoid GMP? Generally, you want to do the least work you can. – Dave Dec 21 '11 at 14:54
  • Detecting overflow is actually one of C's weaknesses. Suppose you manage to avoid overflow as long as possible, and can therefore be sure that you'll have the correct value if-and-only-if it was possible to compute it without overflow. Even then, you still won't know whether overflow actually occurred. – ruakh Dec 21 '11 at 14:59
  • 2
    @Dave: You are right. But the problem is interesting, nonetheless. So the question remains for those who care about the 'how' more than the 'why' :). Maybe somebody ends up using it in an 8051 in an interactive toaster :P – ArjunShankar Dec 21 '11 at 14:59

5 Answers5

6

These are known as the multinomial coefficients, which I shall denote by m(a,b,...).

And you can efficiently calculate them avoiding overflow by exploiting this identity (which should be fairly simple to prove):

m(a,b,c,...) = m(a-1,b,c,...) + m(a,b-1,c,...) + m(a,b,c-1,...) + ...
m(0,0,0,...) = 1 // base case
m(anything negative) = 0 // base case 2

Then it's a simple matter of using recursion to calculate the coefficients. Note that to avoid an exponential running time, you need to either cache your results (to avoid recomputation) or use dynamic programming.

To check for overflow, just make sure the sums won't overflow.

And yes, it's a very bad idea to use arbitrary precision libraries to do this simple task.

tskuzzy
  • 35,812
  • 14
  • 73
  • 140
5

If you have globs of cpu time, you can make lists out of all the factorials, then find the prime factorization of all the numbers in the lists, then cancel all the numbers on the top with those on the bottom, until the numbers are completely reduced.

Dave
  • 10,964
  • 3
  • 32
  • 54
  • How large do you expect N to get? – Dave Dec 21 '11 at 15:10
  • N represents the number of instructions in a 'basic block' being optimized by a compiler back end. So N can be quite large if somebody writes a bunch of code without control statements [but still an integer]. The algorithm might prove useful to a colleague of mine who is implementing an obscure optimization pass for an obscure DSP architecture. The need to avoid GMP is more a whim than a requirement. – ArjunShankar Dec 21 '11 at 15:20
  • For the purposes of this problem, this algorithm is fast enough. However the time spent coding up a factoring and cancellation algorithm is rather overkill IMO. – tskuzzy Dec 21 '11 at 15:22
  • Agreed. I'm marking this correct now because it is a clean, and IMO elegant solution. – ArjunShankar Dec 21 '11 at 23:24
3

The overflow-safest way is the way Dave suggested. You find the exponent with which the prime p divides n! by the summation

m = n;
e = 0;
do{
    m /= p;
    e += m;
}while(m > 0);

Subtract the exponents of p in the factorisations of a! etc. Do that for all primes <= n, and you have the factorisation of the multinomial coefficient. That calculation overflows if and only if the final result overflows. But the multinomial coefficients grow rather fast, so you will have overflow already for fairly small n. For substantial calculations, you will need a bignum library (if you don't need exact results, you can get by a bit longer using doubles).

Even if you use a bignum library, it is worthwhile to keep intermediate results from getting too large, so instead of calculating the factorials and dividing huge numbers, it is better to calculate the parts in sequence,

n!/(a! * b! * c! * ...) = n! / (a! * (n-a)!) * (n-a)! / (b! * (n-a-b)!) * ...

and to compute each of these factors, let's take the second for illustration,

(n-a)! / (b! * (n-a-b)!) = \prod_{i = 1}^b (n-a+1-i)/i

is calculated with

prod = 1
for i = 1 to b
    prod = prod * (n-a+1-i)
    prod = prod / i

Finally multiply the parts. This requires n divisions and n + number_of_parts - 1 multiplications, keeping the intermediate results moderately small.

Daniel Fischer
  • 181,706
  • 17
  • 308
  • 431
1

According to this link, you can calculate multinomial coefficients as a product of several binomial coefficients, controlling integer overflow on the way.

This reduces original problem to overflow-controlled computation of a binomial coefficient.

Evgeny Kluev
  • 24,287
  • 7
  • 55
  • 98
-2

Notations: n! = prod(1,n) where prod you may guess what does.

It's very easy, but first you must know that for any 2 positive integers (i, n > 0) that expression is a positive integer:

prod(i,i+n-1)/prod(1,n)

Thus the idea is to slice the computation of n! in small chunks and to divide asap.

With a, than with b and so on.

perm = (a!/a!) * (prod(a+1, a+b)/b!) * ... * (prod(a+b+c+...y+1,n)/z!)

Each of these factors is an integer, so if perm will not overflow, neither one of its factors will do.

Though, in the calculation of a such factor could be an overflow in numerator or denominator but that's avoidable doing a multiplication in numerator then a division in alternance:

prod(a+1, a+b)/b! = (a+1)(a+2)/2*(a+3)/3*..*(a+b)/b

In that way every division will yield an integer.

Darth Hunterix
  • 1,484
  • 5
  • 27
  • 31
Mihai Manole
  • 195
  • 1
  • 7