Multiply huge integers with thousands of digits

Question

If I have two large integers with thousands of digits, which instructions would be useful to multiply them? I'm assuming I will have to loop over the digits, multiply and add, like you do manual multiplication.

I don't expect any solutions here, just point me to the right instructions that would best accomplish the job. It can be any instruction from the MMX, SSE, x87 fpu etc...

`mul` should do the job just fine. It is likely useful to vectorise this in which case `pmuldq` will be useful. MMX and the x87 FPU will likely not be useful. — fuz, May 02 '20 at 09:38
How are they stored? Like fully packed binary with 64 value bits per 8-byte chunk? Or unpacked decimal digits with one digit per byte? Or Python-internals style with 30 bits per 32-bit integer to allow software carry techniques for languages with `adc`? See also [BigInt class in C++](https://codereview.stackexchange.com/a/237764) for a survey of some ways of doing BigInteger stuff. — Peter Cordes, May 02 '20 at 09:51
@PeterCordes I don't know yet how I should store them. I'm grateful for suggestions. I'm doing this as a self-imposed exercise to get started(again) with assembly. — Roland, May 02 '20 at 09:53
Generally even AVX2 is not useful because it doesn't provide carry across digits, and more importantly only has 32x32 => 64-bit multiply. [SIMD signed with unsigned multiplication for 64-bit \* 64-bit to 128-bit](https://stackoverflow.com/a/28811226). But there is stuff you can do with SIMD: [Can long integer routines benefit from SSE?](https://stackoverflow.com/a/8867025) - for large-enough numbers, FFT-based multiplication is apparently worth doing. — Peter Cordes, May 02 '20 at 09:54
Ok, to get started you definitely need to understand the simple way first, starting with building a 128x128-bit multiply out of 64x64 => 128-bit multiplies. Look at GCC code-gen for `unsigned __in128` multiply on x86-64: https://godbolt.org/z/bCf4a2, doing low*low, and cross products. Pretty sure there's a duplicate on SO somewhere the coverse this. See also https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-large-integer-arithmetic-paper.pdf which covers the standard way and shows speeding it up with `mulx` and `adc`. — Peter Cordes, May 02 '20 at 09:58
Maybe a duplicate of [can multiprecision signed multiply be performed with imul instruction?](https://stackoverflow.com/q/20772280) which has diagrams. plain `mul` produces the high half of the full result in RDX to support extend-precision stuff. — Peter Cordes, May 02 '20 at 10:02
Also, if you just need the job done efficiently, consider using something like `libgmp`. If you want to learn how to do it yourself, still read `libgmp` documentation. It discusses the algorithms used. That's a good starting point. — Margaret Bloom, May 02 '20 at 10:56
You can do this more efficiently [using FFTs](https://en.wikipedia.org/wiki/Schönhage–Strassen_algorithm) — Paul R, May 02 '20 at 11:13
A further benefit for libgmp is that it already has assembler code for many CPU architectures for the lowest level operations, and automatically switches to FFTs for large enough operands. — President James K. Polk, May 02 '20 at 11:21
@PaulR Not really. Schönhage-Strassen really only gives an advantage over simpler methods like Karatsuba's once you have many ten-thousand digits. And getting Karatsuba faster than a straight implementation of elementary school multiplication is not trivial either, given that you can easily vectorise the latter. — fuz, May 02 '20 at 12:04

Multiply huge integers with thousands of digits

0 Answers0