5

I need to find n!%1000000009. n is of type 2^k for k in range 1 to 20. The function I'm using is:

#define llu unsigned long long
#define MOD 1000000009

llu mulmod(llu a,llu b) // This function calculates (a*b)%MOD caring about overflows
{
    llu x=0,y=a%MOD;
    while(b > 0)
    {
        if(b%2 == 1)
        {
            x = (x+y)%MOD;
        }
        y = (y*2)%MOD;
        b /= 2;
    }
    return (x%MOD);
}

llu fun(int n)   // This function returns answer to my query ie. n!%MOD
{
    llu ans=1;
    for(int j=1; j<=n; j++)
    {
        ans=mulmod(ans,j);
    }
    return ans;
}

My demand is such that I need to call the function 'fun', n/2 times. My code runs too slow for values of k around 15. Is there a way to go faster?

EDIT: In actual I'm calculating 2*[(i-1)C(2^(k-1)-1)]*[((2^(k-1))!)^2] for all i in range 2^(k-1) to 2^k. My program demands (nCr)%MOD caring about overflows.

EDIT: I need an efficient way to find nCr%MOD for large n.

CPPCoder
  • 155
  • 1
  • 1
  • 10
  • Maybe there is a way to reduce the complexity ..? `(2^k)!` isn't exactly a "slow" growing function .. – user2864740 Mar 11 '14 at 18:53
  • Actually the complexity of my program depends mostly on the value of k. – CPPCoder Mar 11 '14 at 18:54
  • 1
    Complexity as in [growth rate](http://en.wikipedia.org/wiki/Big_O_notation); since it "runs too slow for values of k around 15", then maybe the algorithm is simply unfeasible to solve larger k's (say as it approaches 20). – user2864740 Mar 11 '14 at 18:55
  • In actual I'm calculating 2*[(i-1)C(2^(k-1)-1)]*[((2^(k-1))!)^2] for all i in range 2^(k-1) to 2^k. My program demands (nCr)%MOD caring about overflows. – CPPCoder Mar 11 '14 at 19:01
  • 1
    Is there a way to handle this overflow part better? – CPPCoder Mar 11 '14 at 19:03
  • 2
    If what you denote nCr are the binomial coefficients, there is a better way than computing the factorials: use the recurrence nCr+1 = nCr.(n-r)/(r+1). Faster and requiring smaller numbers. –  Mar 11 '14 at 19:15
  • Are you only calculating `(2^(k-1))!^2` once or do you recalculate this for every value of `i`? Just moving that calculation out of the loop would save a lot of time. If you have enabled optimizations on your compiler, then there's a decent chance it's moving it out of the loop for you. – SirGuy Mar 11 '14 at 19:17
  • @GuyGreer I'm calculating this term only once – CPPCoder Mar 11 '14 at 19:19
  • Please fix your indentation, it's terrible. – d33tah Mar 11 '14 at 19:23
  • @d33tah I've fixed it. Sorry for that. – CPPCoder Mar 11 '14 at 19:34
  • Why you don't hold answers in an array? When you count for k, you during that you receive all answers for i < k, too. Just in fun write `ans[0]=1;...ans[j]=mulmod(ans[j-1],j);` and return the array. – Tacet Mar 11 '14 at 20:49
  • possible duplicate of [Fast way to calculate n! mod m where m is prime?](http://stackoverflow.com/questions/9727962/fast-way-to-calculate-n-mod-m-where-m-is-prime) – Brett Hale Mar 13 '14 at 13:14
  • For k = 20, (2^k - 1)*2^k still well be in `int64_t`'s range so you can do normal multiplication and modulus in one loop without worrying about overflow – phuclv Mar 14 '14 at 12:27
  • See my answer to [this similar (duplicate) question][1] for a working solution and explanation. [1]: http://stackoverflow.com/questions/24444917/to-find-combination-value-of-large-numbers/29785514#29785514 – Jim Wood Apr 22 '15 at 04:16

2 Answers2

2

Since you are looking for nCr for multiple sequential values of n you can make use of the following:

    (n+1)Cr = (n+1)! / ((r!)*(n+1-r)!)
    (n+1)Cr = n!*(n+1) / ((r!)*(n-r)!*(n+1-r))
    (n+1)Cr = n! / ((r!)*(n-r)!) * (n+1)/(n+1-r)
    (n+1)Cr = nCr * (n+1)/(n+1-r)

This saves you from explicitly calling the factorial function for each i.

Furthermore, to save that first call to nCr you can use:

    nC(n-1) = n //where n in your case is 2^(k-1).

EDIT:
As Aki Suihkonen pointed out, (a/b) % m != a%m / b%m. So the method above so the method above won't work right out of the box. There are two different solutions to this:

  1. 1000000009 is prime, this means that a/b % m == a*c % m where c is the inverse of b modulo m. You can find an explanation of how to calculate it here and follow the link to the Extended Euclidean Algorithm for more on how to calculate it.

  2. The other option which might be easier is to recognize that since nCr * (n+1)/(n+1-r) must give an integer, it must be possible to write n+1-r == a*b where a | nCr and b | n+1 (the | here means divides, you can rewrite that as nCr % a == 0 if you like). Without loss of generality, let a = gcd(n+1-r,nCr) and then let b = (n+1-r) / a. This gives (n+1)Cr == (nCr / a) * ((n+1) / b) % MOD. Now your divisions are guaranteed to be exact, so you just calculate them and then proceed with the multiplication as before. EDIT As per the comments, I don't believe this method will work.

Another thing I might try is in your llu mulmod(llu a,llu b)

    llu mulmod(llu a,llu b)
    {
        llu q = a * b;
        if(q < a || q < b) // Overflow!
        {
            llu x=0,y=a%MOD;
            while(b > 0)
            {
                if(b%2 == 1)
                {
                    x = (x+y)%MOD;
                }
                y = (y*2)%MOD;
                b /= 2;
            }
            return (x%MOD);
        }
        else
        {
            return q % MOD;
        }
    }

That could also save some precious time.

SirGuy
  • 10,660
  • 2
  • 36
  • 66
  • Writing (n+1)Cr = nCr * (n+1)/(n+1-r) % MOD will handle overflows I guess? – CPPCoder Mar 11 '14 at 19:55
  • @CPPCoder yes, since `a%m * b%m == (a*b)%m` – SirGuy Mar 11 '14 at 21:06
  • No, it doesn't, because the '/' operator doesn't work as you expect under modular arithmetic. – Aki Suihkonen Mar 13 '14 at 07:47
  • @AkiSuihkonen yes, you're right. I've added a part on how to get around that. – SirGuy Mar 13 '14 at 12:49
  • `(nCr / a)` is not necessary possible, but would have to be calculated with the method 1: `nCr * a^(-1)`. – Aki Suihkonen Mar 13 '14 at 13:42
  • @AkiSuihkonen It does hold because `(n+1-r) | (nCr * (n+1))` regardless of being modulo some value or not. Everything derived in method 2 comes from this fact alone and does not use anything from modular arithmetic to arrive at this result. – SirGuy Mar 13 '14 at 14:27
  • @AkiSuihkonen The more I think about it, the more I think you're right and method 2 won't work (unless you do `nCr * a^1`). `nCr / a` is always possible, as I said, but `(nCr % m) / a` is not (or at least, I can't prove to myself that it is possible) – SirGuy Mar 17 '14 at 12:19
2

The mulmod routine can be speeded up by a large factor K.

1) '%' is overkill, since (a + b) are both less than N.
- It's enough to evaluate c = a+b; if (c>=N) c-=N;
2) Multiple bits can be processed at once; see optimization to "Russian peasant's algorithm"
3) a * b is actually small enough to fit 64-bit unsigned long long without overflow

Since the actual problem is about nCr mod M, the high level optimization requires using the recurrence

(n+1)Cr mod M = (n+1)nCr / (n+1-r) mod M.

Because the left side of the formula ((nCr) mod M)*(n+1) is not divisible by (n+1-r), the division needs to be implemented as multiplication with the modular inverse: (n+r-1)^(-1). The modular inverse b^(-1) is b^(M-1), for M being prime. (Otherwise it's b^(phi(M)), where phi is Euler's Totient function.)

The modular exponentiation is most commonly implemented with repeated squaring, which requires in this case ~45 modular multiplications per divisor.

If you can use the recurrence

nC(r+1) mod M = nCr * (n-r) / (r+1) mod M

It's only necessary to calculate (r+1)^(M-1) mod M once.

Community
  • 1
  • 1
Aki Suihkonen
  • 19,144
  • 1
  • 36
  • 57
  • I couldn't understand what do you mean by "try to calculate func(2*a[+1]) from func(a)". Can you please explain? – CPPCoder Mar 11 '14 at 20:33
  • It's good if you can calculate f(N+1) from f(N) and N. But that is not nearly as efficient compared to the case when you can calculate f(2N) or f(2N+1) from f(N). When N is large, f(1000000) requires still one million iterations, first to f(999999) and so on. The second format needs a recursive call to f(500000), then to f(250000) and so on. Does that optimization or recurrence exist? I don't know. – Aki Suihkonen Mar 11 '14 at 20:38
  • Computing func(2*a) as a function of func(a) is certainly more ideal in general, but it wouldn't help in this case because he is already looping through all the values, meaning he needs to calculate f(0) -> f(1) -> f(2) -> f(3) -> ... -> f(n). It would be easier (and just as fast to run and easier to code) to simply work out f(a) as a function of f(a-1). – SirGuy Mar 12 '14 at 01:54