Why is pow(a, d, n) so much faster than a**d % n?

Question

I was trying to implement a Miller-Rabin primality test, and was puzzled why it was taking so long (> 20 seconds) for midsize numbers (~7 digits). I eventually found the following line of code to be the source of the problem:

x = a**d % n

(where a, d, and n are all similar, but unequal, midsize numbers, ** is the exponentiation operator, and % is the modulo operator)

I then I tried replacing it with the following:

x = pow(a, d, n)

and it by comparison it is almost instantaneous.

For context, here is the original function:

from random import randint

def primalityTest(n, k):
    if n < 2:
        return False
    if n % 2 == 0:
        return False
    s = 0
    d = n - 1
    while d % 2 == 0:
        s += 1
        d >>= 1
    for i in range(k):
        rand = randint(2, n - 2)
        x = rand**d % n         # offending line
        if x == 1 or x == n - 1:
            continue
        for r in range(s):
            toReturn = True
            x = pow(x, 2, n)
            if x == 1:
                return False
            if x == n - 1:
                toReturn = False
                break
        if toReturn:
            return False
    return True

print(primalityTest(2700643,1))

An example timed calculation:

from timeit import timeit

a = 2505626
d = 1520321
n = 2700643

def testA():
    print(a**d % n)

def testB():
    print(pow(a, d, n))

print("time: %(time)fs" % {"time":timeit("testA()", setup="from __main__ import testA", number=1)})
print("time: %(time)fs" % {"time":timeit("testB()", setup="from __main__ import testB", number=1)})

Output (run with PyPy 1.9.0):

2642565
time: 23.785543s
2642565
time: 0.000030s

Output (run with Python 3.3.0, 2.7.2 returns very similar times):

2642565
time: 14.426975s
2642565
time: 0.000021s

And a related question, why is this calculation almost twice as fast when run with Python 2 or 3 than with PyPy, when usually PyPy is much faster?

BrenBarn · Accepted Answer · 2013-01-03T06:08:23.423

172

See the Wikipedia article on modular exponentiation. Basically, when you do a**d % n, you actually have to calculate a**d, which could be quite large. But there are ways of computing a**d % n without having to compute a**d itself, and that is what pow does. The ** operator can't do this because it can't "see into the future" to know that you are going to immediately take the modulus.

edited Jan 03 '13 at 06:08

answered Jan 03 '13 at 06:03

BrenBarn

242,874
37
412
384

14

+1 that's actually what the docstring implies `>>> print pow.__doc__ pow(x, y[, z]) -> number With two arguments, equivalent to x**y. With three arguments, equivalent to (x**y) % z, but may be more efficient (e.g. for longs).` – Hedde van der Heide Jan 03 '13 at 06:06
6

Depending on your Python version, this may only be true under certain conditions. IIRC, in 3.x and 2.7, you can only use the three-argument form with integral types (and non-negative power), and you will always get modular exponentiation with the native `int` type, but not necessarily with other integral types. But in older versions there were rules about fitting into a C `long`, the three-argument form was allowed for `float`, etc. (Hopefully you're not using 2.1 or earlier, and aren't using any custom integral types from C modules, so none of this matters to you.) – abarnert Jan 03 '13 at 06:12
14

From your answer it seems like it is impossible for a compiler to see the expression and optimize it, which isn't true. It *just happens* that no current Python compilers do it. – danielkza Jan 03 '13 at 12:27
5

@danielkza: That's true, I didn't mean to imply it's theoretically impossible. Maybe "doesn't look into the future" would be more accurate than "can't see into the future". Note, though, that the optimization could be extremely difficult or even impossible in general. For *constant* operands it could be optimized, but in `x ** y % n`, `x` could be an object that implements `__pow__` and, based on a random number, returns one of several different objects implementing `__mod__` in ways that also depend on random numbers, etc. – BrenBarn Jan 03 '13 at 19:12
2

@danielkza: Also, the functions don't have the same domain: `.3 ** .4 % .5` is perfectly legal, but if the compiler transformed that into `pow(.3, .4, .5)` that would raise a `TypeError`. The compiler would have to be able to know that `a`, `d`, and `n` are guaranteed to be values of an integral type (or maybe just specifically of type `int`, because the transformation doesn't help otherwise), and `d` is guaranteed to be non-negative. That's something a JIT could conceivably do, but a static compiler for a language with dynamic types and no inference just can't. – abarnert Jan 03 '13 at 19:48
@danielkza: Actually, I just thought of a possibility: transform any instance of `a**d % n` into something like `try: __ret=pow(a, d, n) except TypeError: __ret = a**d % n; __ret`. That's going to be a horrible pessimization for anyone who actually wants to use floating-point (a quick test in CPython 3.3 shows 11.7s vs. 0.8s for 10M reps; it's not quite as bad in PyPy), but I'm guessing not too many algorithms need `a**d % n` with floating-point values… – abarnert Jan 03 '13 at 19:52
@Hedde: you could use `help(pow)` instead of `print pow.__doc__` – jfs Jan 18 '13 at 01:59
Actually, that's wrong: you don't need a JIT, but you need to support multiple internal representations of objects of type "long". Then a**b could return a "long" object that stores internally "I'm a**b, for these values of a and b". If the only thing we do with it is modulo, it can be efficient again. – Armin Rigo Feb 24 '13 at 09:52
@ArminRigo As far as I understand you are implying that after the exponentiation is done, the original base and exponent is saved? As far as I can tell, then, the costly operation is already done and taking the result mod n would be more efficient than then using `pow(a,b,n)`. – xuiqzy Feb 10 '21 at 21:26
1

@xuiqzy No, I'm saying that the exponentiation is not done. The operator ``int.__pow__`` would return an object that stores a and b and is able to lazily compute the exponentiation if needed. In practice it cannot really work on CPython because it doesn't support non-standard representations for core types. It could be done for PyPy, though. – Armin Rigo Feb 13 '21 at 19:52

abarnert · Answer 2 · 2013-01-03T06:39:03.443

37

BrenBarn answered your main question. For your aside:

why is it almost twice as fast when run with Python 2 or 3 than PyPy, when usually PyPy is much faster?

If you read PyPy's performance page, this is exactly the kind of thing PyPy is not good at—in fact, the very first example they give:

Bad examples include doing computations with large longs – which is performed by unoptimizable support code.

Theoretically, turning a huge exponentiation followed by a mod into a modular exponentiation (at least after the first pass) is a transformation a JIT might be able to make… but not PyPy's JIT.

As a side note, if you need to do calculations with huge integers, you may want to look at third-party modules like gmpy, which can sometimes be much faster than CPython's native implementation in some cases outside the mainstream uses, and also has a lot of additional functionality that you'd otherwise have to write yourself, at the cost of being less convenient.

edited Jan 03 '13 at 06:39

answered Jan 03 '13 at 06:17

abarnert

354,177
51
601
671

2

longs got fixed. try pypy 2.0 beta 1 (it won't be faster than CPython, but should not be slower either). gmpy does not have a way to handle MemoryError :( – fijal Jan 05 '13 at 21:26
@fijal: Yeah, and `gmpy` is also slower instead of faster in a few cases, and makes a lot of simple things less convenient. It's not _always_ the answer—but sometimes it is. So it's worth looking at if you're dealing with huge integers and Python's native type doesn't seem fast enough. – abarnert Jan 06 '13 at 12:01
1

and if you don't care if your numbers being big makes your program segfault – fijal Jan 07 '13 at 18:58
@fijal: I don't know why you're so bitterly angry at `gmpy`. Yes, if I accidentally tried to compute Graham's number, it would eventually segfault instead of raising a catchable-but-non-recoverable exception. I shudder to think how long that "eventually" would be on a typical machine with a 64-bit CPU, 8-32GB of RAM, and a few TB of swap space… But anyway: Sometimes that difference is important, sometimes it's not. Making it the only factor in your decision, whether it's actually relevant or not, is silly. – abarnert Jan 07 '13 at 19:11
1

It's the factor that made PyPy not use GMP library for it's longs. It might be ok for you, it's not ok for Python VM developers. The malloc can fail without using a lot of RAM, just put a very large number there. Behavior of GMP from that point on is undefined and Python can't allow this. – fijal Jan 07 '13 at 21:42
1

@fijal: I completely agree that it shouldn't be used for implementing Python's built-in type. That doesn't mean it shouldn't ever be used for anything ever. – abarnert Jan 07 '13 at 23:30

score 13 · Answer 3 · answered Jan 03 '13 at 06:07

There are shortcuts to doing modular exponentiation: for instance, you can find a**(2i) mod n for every i from 1 to log(d) and multiply together (mod n) the intermediate results you need. A dedicated modular-exponentiation function like 3-argument pow() can leverage such tricks because it knows you're doing modular arithmetic. The Python parser can't recognize this given the bare expression a**d % n, so it will perform the full calculation (which will take much longer).

Yuushi · Answer 4 · 2013-01-03T06:15:54.073

3

The way x = a**d % n is calculated is to raise a to the d power, then modulo that with n. Firstly, if a is large, this creates a huge number which is then truncated. However, x = pow(a, d, n) is most likely optimized so that only the last n digits are tracked, which are all that are required for calculating multiplication modulo a number.

edited Jan 03 '13 at 06:15

answered Jan 03 '13 at 06:08

Yuushi

25,132
7
63
81

6

"it requires d multiplications to calculate x**d" -- not correct. You can do it in O(log d) (very wide) multiplications. Exponentiation by squaring can be used without a module. The sheer size of the multiplicands is what takes the lead here. – John Dvorak Jan 03 '13 at 06:10
@JanDvorak True, I'm not sure why I thought python wouldn't utilize the same exponentiation algorithm for `**` as for `pow`. – Yuushi Jan 03 '13 at 06:14
6

Not the last "n" digits.. it just keeps calculations in Z/nZ. – Thomas Jan 03 '13 at 08:20

Why is pow(a, d, n) so much faster than a**d % n?

4 Answers4

Linked

Related