How to multiply big numbers faster?

Question

I was experimenting with multiplying large numbers in python. For my purpose I was trying to evaluate.

2*odd*(4*odd^2 + 7)/3 - 6*odd^2 - 3

Now my question boils down to how to multiply numbers faster in python. is it faster to do it with float? The answer seems no. Take a simpler example with

n*(n+1)/2

My idea was that the following code is faster

product = 1
if n % 2 == 0:
    product *= n/2
    product *= n
else:
    product *= (n+1)/2
    product *= n
return product

Why would this be faster? Well you would not have to divide the huuge number n*(n+1) by two. However one does waste a calculation checking the number modulo2. Perhaps try exception is faster?

So my question boils down to. How does one compute the product and division of very large numbers in python? Here is my working code. Not asking for specifically speed improvements on this code. But the more broad question on how to deal with division and multiplication of large numbers. My numrange is around 10^(10^6) atm.

def spiral_sum_fast(num):
    rem = num % 6
    if rem % 2 == 0:
        raise Exception("The sidelength must be a odd number")
    odd = 1 + num / 2
    odd_squared = 2 * odd**2

    if rem % 3 == 0:
        temp = odd / 3
        temp *= 8 * odd_squared + 14
    else:
        temp = (4 * odd_squared + 7) / 3
        temp *= 2 * odd
    return temp - 3 * odd_squared - 3


if __name__ == '__main__':

    k = 10**(10**6) + 1
    spiral_sum_fast(k)

That whole `product = 1; if...` block could just be `product = (n+n%2)/2*n`. — TigerhawkT3, May 24 '16 at 05:00
this might prove insightful and instructive http://stackoverflow.com/questions/538551/handling-very-large-numbers-in-python — glls, May 24 '16 at 05:13
Well, I'm sure it's _slightly_ faster, as it never has to look up `product`, doesn't have to do an `if..else`, and so on, but it'll be a marginal increase. Eliminating the `product =` and just doing `return (n+n%2)/2*n` would be marginally faster still. There might be some more marginal improvements based on the order in which the compiler optimizes those operations. If you really need more performance, you might consider writing that part in a lower-level language like C. — TigerhawkT3, May 24 '16 at 05:22
Don't use try-except, throwing exceptions (and catching them, especially using an exception filter) is slow. — Byte Commander, May 24 '16 at 05:38
@TigerhawkT3, (n+n%2)/2*n is not equal to (n+1)*n/2 if n is even. — Dmitry Rubanovich, May 24 '16 at 06:33
@ByteCommander: That's not good advice. There are many good reasons to use try-except, and avoiding it for performance reasons is likely to be a case of premature optimization in most circumstances. http://stackoverflow.com/questions/16138232/is-it-a-good-practice-to-use-try-except-else-in-python — Mark Dickinson, May 24 '16 at 07:03
@DmitryRubanovich - As far as I can tell, it's supposed to be `n` when `n` is even, `n+1` if `n` is odd. I don't think my change is inaccurate. — TigerhawkT3, May 24 '16 at 07:08
@MarkDickinson Yes, sorry if you took my sentence as general advice. It was just meant to be applied to the specific situation described in the question, where we have few code and where the exception would not be "natural" but manually thrown just to control the program flow. As the focus here is on performance, instantiating a new exception object every time and throwing, filtering and catching it again is a massive, unnecessary overhead. — Byte Commander, May 24 '16 at 07:08
@N3buchadnezzar: Short answer: use [gmpy2](https://pypi.python.org/pypi/gmpy2). It's designed for large-integer arithmetic, where Python's core algorithms are not. — Mark Dickinson, May 24 '16 at 07:21

Dmitry Rubanovich · Answer 1 · 2016-06-03T17:14:30.193

First of all, 4*odd^2 + 7 is always going to be 2 mod 3. So you don't need to check.

But you don't want to reorder the operations, anyway. Consider what happens to integer arithmetic of 4*5/3 vs 4/3*5. The first result is 6. While the 2nd result is 5. With larger numbers you'd get even more loss of information if you move division to an earlier position in the order of operations.

Another point: you never want to do x**2. ** uses C's pow() so it will have a floating point result. Just use x*x instead. It'll be faster and more accurate.

As for the example of n*(n+1)/2, n*(n+1) will always be divisible by 2. And all you do is shift a large number by 1 bit to the right and you know that the bit you lose is 0.

It seems like you are generally worried about dividing large numbers by small ones. But (as in the 1st example that I gave), if you do the division too early, you'll lose information and get less accurate answers. The calculation won't be faster, either. You might need tricks like this if you divide large numbers by large numbers, but if you are dividing large numbers by small numbers (and you are using integer arithmetic), the result will be only a few bits shorter than the large number you started with. It will neither be faster, nor more accurate, to "divide early" or to use floating-point arithmetic.

score 1 · Accepted Answer · answered May 24 '16 at 06:03

In general you should specialised libraries for specialised jobs. In your case dealing with large numbers and getting optimal speed might require this. Have a look at this this blog which deals with big numbers and makes speed comparisons between pure Python and using the GNU Multiple Precision Arithmetic Library from Python, which results in a big performance improvement.

score 1 · Answer 3 · edited May 23 '17 at 12:01

The core of your question appears to be a variation of "how fast are numeric operations in python?" and the answer to that depends on what kind of number you're working on, and what kind of machine you are using.

There are 4 kinds of numbers in Python: https://docs.python.org/2/library/stdtypes.html#numeric-types-int-float-long-complex

The question of what kind of machine you are using affects whether or not floating point math will be fast or slow. For the most part, all numeric work will be eclipsed by the slowness of memory, but there are some cases where the number set you are working and instructions will all fit in cache and not be bounded by memory access time.

So, the next question is, since an "int" is really usually a 32-bit C long in python, and a "long" is anything larger than that, are your ints really large enough to cross the 4B barrier into longs? From the sound of it, yes, so let's look at the implementation of the Python longobject: https://github.com/python/cpython/blob/master/Objects/longobject.c

So, it's just a bunch of int mults. Admittedly, it's a linear chain of them, so, how long is that chain?

>>> k = 10**(10**6) + 1
>>> k.bit_length()/32
103810

Ok, the int chain is longer than the few hundred cycles each hit to main memory takes, so you might be hitting the point where you are actually CPU bound...

until you look at how big that number actually is.

>>> k.bit_length()/(8*1024)
405

405KB? How big are caches? http://www.intel.com/content/www/us/en/processors/core/6th-gen-core-family-mobile-u-y-processor-lines-datasheet-vol-1.html

So, L1, 64B? L2 is probably 256KB? L3 is the one that's probably around 4M. Even then - how many of these 405KB numbers are being kept around during the process? How much memory does python need to run? Are we thrashing the cache? How expensive is that?

Approximate cost to access various caches and main memory?

This is as far as I'm digging into this hole for now, but signs are pointing to slowness from cache exhaustion. You are using big numbers. Big is more than a math concept, it pushes up against the physical reality of memory. The remaining profiling of "how many temp copies is python keeping around?" and "how is the interpreter using the rest of memory?" are interesting questions, and beyond the scope of this answer.

How to multiply big numbers faster?

3 Answers3

Linked