Fast way to calculate n! mod m where m is prime?

Question

I was curious if there was a good way to do this. My current code is something like:

def factorialMod(n, modulus):
    ans=1
    for i in range(1,n+1):
        ans = ans * i % modulus    
    return ans % modulus

But it seems quite slow!

I also can't calculate n! and then apply the prime modulus because sometimes n is so large that n! is just not feasible to calculate explicitly.

I also came across http://en.wikipedia.org/wiki/Stirling%27s_approximation and wonder if this can be used at all here in some way?

Or, how might I create a recursive, memoized function in C++?

How slow is slow? From your pseudocode, I infer you're computing this in Python, is that right? — Fred Foo, Mar 15 '12 at 20:56
Any language, really; it's pretty much the same in C++ in terms of syntax. I chose Python here because it's easy to read. Even in C++, though, I need a faster function. — John Smith, Mar 15 '12 at 20:58
There's a very fast way to do this using invariant multiplication or possibly [Montgomery reduction](http://en.wikipedia.org/wiki/Montgomery_reduction). Both methods eliminate the modulus and will allow for loop-unrolling techniques. — Mysticial, Mar 15 '12 at 21:25
You can break down modulus into prime factors to identify cases that will be zero more easily, although that won't help for large prime factors - how helpful this is depends on what you know about the modulus, if anything, and if prime factorisation tickles your fancy. — davin, Mar 15 '12 at 21:26
You can shave a bit of time off by only doing the mod if ans > modulus (credit: http://tech.groups.yahoo.com/group/primenumbers/messages/1095?threaded=1&m=e&var=1&tidx=1 ) — Andrew Morton, Mar 15 '12 at 21:32
@kilotaras: Yes, M is prime (sorry, did not know this was relevant; will change post) — John Smith, Mar 15 '12 at 21:50
@John: I've improved my answer, and included some Python code. This version should improve performance greatly when `n >> m/2` *(performance will be the same when `n <= m/2`). I really don't see how memoization could be implemented with this problem - there is nothing to memoize. — BlueRaja - Danny Pflughoeft, Mar 16 '12 at 18:37
If `n! mod m` could be calculated in polynomial time (in the number of bits), factoring would be in P. At least so they say on this complexity blog: http://rjlipton.wordpress.com/2013/05/06/a-most-perplexing-mystery — Thomas Ahle, May 11 '13 at 15:49
In fact, the link that @ThomasAhle meant to give is: https://rjlipton.wordpress.com/2009/02/23/factoring-and-factorials . As a side note, I think that the ideas expressed there are flawed, as noticed in a comment on that page. No wonder there is no other mention of this fact on the whole internet. — Alex M., Jan 27 '15 at 00:14

BlueRaja - Danny Pflughoeft · Answer 1 · 2012-03-16T18:44:32.353

n can be arbitrarily large

Well, n can't be arbitrarily large - if n >= m, then n! ≡ 0 (mod m) (because m is one of the factors, by the definition of factorial).

Assuming n << m and you need an exact value, your algorithm can't get any faster, to my knowledge. However, if n > m/2, you can use the following identity (Wilson's theorem - Thanks @Daniel Fischer!)

(image)

to cap the number of multiplications at about m-n

(m-1)! ≡ -1 (mod m)
1 * 2 * 3 * ... * (n-1) * n * (n+1) * ... * (m-2) * (m-1) ≡ -1 (mod m)
n! * (n+1) * ... * (m-2) * (m-1) ≡ -1 (mod m)
n! ≡ -[(n+1) * ... * (m-2) * (m-1)]^-1 (mod m)

This gives us a simple way to calculate n! (mod m) in m-n-1 multiplications, plus a modular inverse:

def factorialMod(n, modulus):
    ans=1
    if n <= modulus//2:
        #calculate the factorial normally (right argument of range() is exclusive)
        for i in range(1,n+1):
            ans = (ans * i) % modulus   
    else:
        #Fancypants method for large n
        for i in range(n+1,modulus):
            ans = (ans * i) % modulus
        ans = modinv(ans, modulus)
        ans = -1*ans + modulus
    return ans % modulus

We can rephrase the above equation in another way, that may or may-not perform slightly faster. Using the following identity:

(image)

we can rephrase the equation as

n! ≡ -[(n+1) * ... * (m-2) * (m-1)]^-1 (mod m)
n! ≡ -[(n+1-m) * ... * (m-2-m) * (m-1-m)]^-1 (mod m)
       (reverse order of terms)
n! ≡ -[(-1) * (-2) * ... * -(m-n-2) * -(m-n-1)]^-1 (mod m)
n! ≡ -[(1) * (2) * ... * (m-n-2) * (m-n-1) * (-1)^(m-n-1)]^-1 (mod m)
n! ≡ [(m-n-1)!]^-1 * (-1)^(m-n) (mod m)

This can be written in Python as follows:

def factorialMod(n, modulus):
    ans=1
    if n <= modulus//2:
        #calculate the factorial normally (right argument of range() is exclusive)
        for i in range(1,n+1):
            ans = (ans * i) % modulus   
    else:
        #Fancypants method for large n
        for i in range(1,modulus-n):
            ans = (ans * i) % modulus
        ans = modinv(ans, modulus)

        #Since m is an odd-prime, (-1)^(m-n) = -1 if n is even, +1 if n is odd
        if n % 2 == 0:
            ans = -1*ans + modulus
    return ans % modulus

If you don't need an exact value, life gets a bit easier - you can use Stirling's approximation to calculate an approximate value in O(log n) time (using exponentiation by squaring).

Finally, I should mention that if this is time-critical and you're using Python, try switching to C++. From personal experience, you should expect about an order-of-magnitude increase in speed or more, simply because this is exactly the sort of CPU-bound tight-loop that natively-compiled code excels at (also, for whatever reason, GMP seems much more finely-tuned than Python's Bignum).

"Thus, when `m/2 < n < m`, you only need to calculate `(m/2)! * (-2)^(n-m/2-1) (mod m)`" You can do better then. By Wilson's theorem, `(m-1)! ≡ -1 (mod m)` if `m` is prime. Now `(m-1)! = n! * (m - (m-n-1)) * ... * (m - 1) ≡ (-1)^(m-n-1) * n! * (m-n-1)! (mod m)`, so `n! ≡ (-1)^(m-n) * ((m-n-1)!)^(-1) (mod m)`. So you need to calculate `(m-n-1)! mod m`, find its modular inverse (O(log m) steps), and adjust the sign if necessary. Not much difference when `n` is close to `m/2`, but nice when `n > 3m/4` or so. — Daniel Fischer, Mar 16 '12 at 10:54
I cam3 to this question wondering about the specific case where `m=4k+1` and `n=2k`, `k` possibly VERY large. — Spencer, May 18 '22 at 15:29

Mysticial · Accepted Answer · 2012-03-15T21:43:45.943

18

Expanding my comment to an answer:

Yes, there are more efficient ways to do this. But they are extremely messy.

So unless you really need that extra performance, I don't suggest to try to implement these.

The key is to note that the modulus (which is essentially a division) is going to be the bottleneck operation. Fortunately, there are some very fast algorithms that allow you to perform modulus over the same number many times.

These methods are fast because they essentially eliminate the modulus.

Those methods alone should give you a moderate speedup. To be truly efficient, you may need to unroll the loop to allow for better IPC:

Something like this:

ans0 = 1
ans1 = 1
for i in range(1,(n+1) / 2):
    ans0 = ans0 * (2*i + 0) % modulus    
    ans1 = ans1 * (2*i + 1) % modulus    

return ans0 * ans1 % modulus

but taking into account for an odd # of iterations and combining it with one of the methods I linked to above.

Some may argue that loop-unrolling should be left to the compiler. I will counter-argue that compilers are currently not smart enough to unroll this particular loop. Have a closer look and you will see why.

Note that although my answer is language-agnostic, it is meant primarily for C or C++.

edited Mar 15 '12 at 21:43

answered Mar 15 '12 at 21:38

Mysticial

464,885
45
335
332

1

It might be nice to get a comment from whoever just downvoted the 3 top ansewrs. – Mysticial Mar 15 '12 at 22:07
How might recursion + memoization be done in C++ for factoral mod m? – John Smith Mar 15 '12 at 22:18
@JohnSmith TBH, Memoization is probably not going to help at all - there's nothing to memoize. The only way it might become helpful is if you try the prime-factorization approach and use the [windowing algorithm for exponentiation by squaring](http://en.wikipedia.org/wiki/Exponentiation_by_squaring#Sliding_window_method). (The windowing algorithm is a memoization algorithm.) But prime factorizing all integers from `1` to `n` will probably be slower than your current algorithm. – Mysticial Mar 15 '12 at 22:22
Well in my case I am iterating from low n to high n, so doesn't that mean I can save time by storing values I've already calculated? For large n it seems like it'd save a lot of time by only doing a couple iterations rather than go from i=1 to n or n/2 – John Smith Mar 15 '12 at 22:23
1

Well... There's nothing to "save". Knowing a couple iterations won't help you with the rest of them. – Mysticial Mar 15 '12 at 22:27
But if I am calculating a bunch of numbers on the order of something like (20 million)! and greater, don't these iterations matter? – John Smith Mar 15 '12 at 22:33
I'll just save this as answer and make another question – John Smith Mar 15 '12 at 22:35
Ah ok, I was going to ask to clarify what you mean by "iterations matter". But I'll wait for your new question. – Mysticial Mar 15 '12 at 22:37

score 16 · Answer 3 · answered Jun 11 '13 at 13:38

16

n! mod m can be computed in O(n^{1/2 + ε}) operations instead of the naive O(n). This requires use of FFT polynomial multiplication, and is only worthwhile for very large n, e.g. n > 10⁴.

An outline of the algorithm and some timings can be seen here: http://fredrikj.net/blog/2012/03/factorials-mod-n-and-wilsons-theorem/

answered Jun 11 '13 at 13:38

Fredrik Johansson

25,490
3
25
17

2

This is a much better answer than the accepted answer. – Robin Houston Jan 23 '18 at 15:52

score 7 · Answer 4 · edited Jun 20 '13 at 18:13

If we want to calculate M = a*(a+1) * ... * (b-1) * b (mod p), we can use the following approach, if we assume we can add, substract and multiply fast (mod p), and get a running time complexity of O( sqrt(b-a) * polylog(b-a) ).

For simplicity, assume (b-a+1) = k^2, is a square. Now, we can divide our product into k parts, i.e. M = [a*..*(a+k-1)] *...* [(b-k+1)*..*b]. Each of the factors in this product is of the form p(x)=x*..*(x+k-1), for appropriate x.

By using a fast multiplication algorithm of polynomials, such as Schönhage–Strassen algorithm, in a divide & conquer manner, one can find the coefficients of the polynomial p(x) in O( k * polylog(k) ). Now, apparently there is an algorithm for substituting k points in the same degree-k polynomial in O( k * polylog(k) ), which means, we can calculate p(a), p(a+k), ..., p(b-k+1) fast.

This algorithm of substituting many points into one polynomial is described in the book "Prime numbers" by C. Pomerance and R. Crandall. Eventually, when you have these k values, you can multiply them in O(k) and get the desired value.

Note that all of our operations where taken (mod p). The exact running time is O(sqrt(b-a) * log(b-a)^2 * log(log(b-a))).

The algorithm of "substituting many points into one polynomial" is described also in the well known book "introduction to algorithms" by H. Cormen and others (in the FFT chapter). — ohad, Oct 07 '14 at 17:22

score 1 · Answer 5 · answered Mar 15 '12 at 22:13

Expanding on my comment, this takes about 50% of the time for all n in [100, 100007] where m=(117 | 1117):

Function facmod(n As Integer, m As Integer) As Integer
    Dim f As Integer = 1
    For i As Integer = 2 To n
        f = f * i
        If f > m Then
            f = f Mod m
        End If
    Next
    Return f
End Function

Cong Tran An · Answer 6 · 2016-07-30T09:17:06.223

0

I found this following function on quora:
With f(n,m) = n! mod m;

function f(n,m:int64):int64;
         begin
              if n = 1 then f:= 1
              else f:= ((n mod m)*(f(n-1,m) mod m)) mod m;
         end;

Probably beat using a time consuming loop and multiplying large number stored in string. Also, it is applicable to any integer number m.
The link where I found this function : https://www.quora.com/How-do-you-calculate-n-mod-m-where-n-is-in-the-1000s-and-m-is-a-very-large-prime-number-eg-n-1000-m-10-9+7

edited Jul 30 '16 at 09:17

answered Jul 30 '16 at 09:10

Cong Tran An

125
6

1

It's exactly the same as the naive algorithm implemented as a recursive function. – Mercado Jan 10 '17 at 20:58

score -1 · Answer 7 · answered Apr 04 '15 at 13:02

-1

If n = (m - 1) for prime m then by http://en.wikipedia.org/wiki/Wilson's_theorem n! mod m = (m - 1)

Also as has already been pointed out n! mod m = 0 if n > m

answered Apr 04 '15 at 13:02

Philip Riebold

9

This is not helpful. BlueRaja-Danny-Pflughoeft already mentioned Wilson's theorem, and it doesn't do much because you can't count on needing just (m-1)!, or (m-k)! for small k, which his answer covered but yours didn't. – Douglas Zare Apr 04 '15 at 19:23

cdeszaq · Answer 8 · 2012-03-15T21:08:22.183

-11

Assuming that the "mod" operator of your chosen platform is sufficiently fast, you're bounded primarily by the speed at which you can calculate n! and the space you have available to compute it in.

Then it's essentially a 2-step operation:

Calculate n! (there are lots of fast algorithms so I won't repeat any here)
Take the mod of the result

There's no need to complexify things, especially if speed is the critical component. In general, do as few operations inside the loop as you can.

If you need to calculate n! mod m repeatedly, then you may want to memoize the values coming out of the function doing the calculations. As always, it's the classic space/time tradeoff, but lookup tables are very fast.

Lastly, you can combine memoization with recursion (and trampolines as well if needed) to get things really fast.

edited Mar 15 '12 at 21:08

answered Mar 15 '12 at 20:59

cdeszaq

30,869
25
117
173

3

however, for large n, calculating n! and then performing mod is not feasible – John Smith Mar 15 '12 at 21:00
1

Not feasible...why? Due to memory constraints? From the question, speed was the issue, not memory. If you are looking to have as small a memory footprint as possible and _then_ optimize for speed, please update your question to reflect this. – cdeszaq Mar 15 '12 at 21:01
Is there usually a tradeoff between speed and memory when it comes to factorials? I'll update the question either way – John Smith Mar 15 '12 at 21:02
@JohnSmith - There's _always_ a speed/memory trade-off with _any_ non-trivial calculation. I have updated my answer with some other non-approximate ways of calculating the values in a less time-dependent way. Typically, you run into issues dealing with gigantic numbers when you deal with factorials, which can lead to memory problems in some cases, but there are ways of handling it. (See Java's BigInt class) – cdeszaq Mar 15 '12 at 21:06
12

-1 Computing n! and then mod is very slow, please try to compute 2000000! mod 5250307 that way. OP is doing it better in the question, you should interleave multiplication and taking modulo. – sdcvvc Mar 15 '12 at 21:07
7

@cdeszaq: What you seem to be missing is that multiplying two extremely large numbers (larger than the size of a register) is not `O(1)` on a computer: it's closer to `O(m log m)` *(`m` = #bits)*. Multiplying two m-bit numbers results in (m+m)-bits, so your method takes approximately `m log(m) + 2m log(m) + 3m log(m) + ... + nm log(m) = nm log(m)(n+1)/2 = O(mn^2 log(m))` operations. Taking a modulus after each operation, however, would result in about `2(m log (m)) + 2(m log(m)) + ...n additions... + 2(m log(m)) = 2mn log(m) = O(mn log(m))` which is significantly faster, even for small `n`. – BlueRaja - Danny Pflughoeft Mar 15 '12 at 22:15
1

Computing `n!` for **very** large `n` is not only slow, but quite impossible because the numbers get so large you can't address them any more. – Bogdan Alexandru Apr 22 '15 at 09:19

Fast way to calculate n! mod m where m is prime?

8 Answers8

Linked

Related