You're not going to do better than ((a * x + b) % P) % m
in pure Python code; the overhead of the Python interpreter is going to bottleneck you more than anything else; yes, if you ensure the m
is a power of two, you can precompute mm1 = m - 1
and change the computation to ((a * x + b) % P) & mm1
, replacing a more expensive remaindering operation with a cheaper bitmasking operation, but unless P
is huge (hundreds of bits minimum), the interpreter overhead will likely outweigh the differences between remainder and bitmasking.
If you really need the performance, and the types you're working with will fit in C level primitive type, you may benefit from writing a Python C extension that converts all the values to size_t
, Py_hash_t
, uint64_t
, or whatever suits your problem and performs the math as a set of bulk conversions to C types, C level math, then a single conversion back to Python int
, saving a bunch of byte code and intermediate values (that are promptly tossed).
If the values are too large to fit in C primitives, GMP types are an option (look at mpz_import
and mpz_export
for efficient conversions from PyLong
to mpz_t
and back), but the odds of seeing big savings go down; GMP does math faster in general, and can mutate numbers in place rather than creating and destroying lots of temporaries, but even with mpz_import
and mpz_export
, the cost of converting between Python and GMP types would likely eat most of the savings.