4

I was watching some lecture on algorithms, and the professor used multiplication as an example of how naive algorithms can be improved...

It made me realize that multiplication is not that obvious, although when I am coding I just consider it a simple atomic operation, multiplication requires a algorithm to run, it does not work like summing numbers.

So I wonder, what algorithm modern desktop processors actually use? I guess they don't rely on logarithm tables, and don't make loops with thousands of sums...

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
speeder
  • 6,197
  • 5
  • 34
  • 51
  • 11
    See: https://en.wikipedia.org/wiki/Binary_multiplier + https://en.wikipedia.org/wiki/Wallace_tree + https://en.wikipedia.org/wiki/Dadda_multiplier – Paul R Oct 14 '14 at 21:11
  • 1
    You're going to need to specify "integer" or "floating point"; and possibly which CPU. Note that for floating point the exponents are added and the significands are multiplied, and the significands are mostly in the range from 1.0 to 1.9999* which makes them "more suited" to approaches that don't make sense for integers.. – Brendan May 31 '20 at 04:02
  • You need to pick up a specific CPU and look into its **full datasheet** (I mean 10MByte or bigger PDFs). They sometimes mention stuff like this. But to know for sure you would need either ask that CPU developers/manufactors or [inspect its die](http://www.righto.com/2013/09/the-z-80-has-4-bit-alu-heres-how-it.html) the [Visual6502](http://www.visual6502.org/) contain shots of a lot of chips pick one... The older Integer multipliers used Shift and Add multiplication now who knows (LUTs, Approximations, Karatsuba ...) – Spektre May 31 '20 at 08:48
  • In case you did mean FP: a lot of an FP multiplier is the integer mantissa multiplier. Adding the exponents is straightforward, and renormalizing the mantissa result is only an extra 0 or 1 offset to the exponent, assuming normalized inputs and result. – Peter Cordes Jun 01 '20 at 06:44
  • The internal design of an ALU can be part of the secret sauce of what makes a CPU fast and power-efficient. It's interesting to CPU architects, but totally irrelevant to software, even low-level hand tuning of asm. All that matters is the performance: latency and throughput. That said, it is a programming question insofar as hardware design counts as programming, but it's asking about how existing designs were done. – Peter Cordes Jun 05 '20 at 04:38

3 Answers3

7

Mitch Alsup (who worked on Motorola 88K, Ross SPARC, AMD x86, etc.) has stated on the comp.arch newsgroup:

All modern multiplier designers use the Dadda method for building the tree.

(Message-ID: <c45d9d2e-039d-4085-a617-d90f7a3b1f93@googlegroups.com> — 14 December 2018)

and (with respect to availability of recent references for what multiplication mechanisms are used by AMD/Intel/NVIDIA):

Only in the patent office.

(Message-ID: <d92d1961-a3e4-441e-8b3d-b9ce6bd24b58@googlegroups.com> — 14 January 2020)

See Wikipedia for information on Dadda tree multipliers.

  • 2
    (To mention time frames: the processor designs named are from the 1990s, Mr. Alsup was chief architect of the Athlon K9 around the turn of the millennium and retired about 2015. The messages are from Dec '18 and Jan '20.) – greybeard Jun 03 '20 at 05:56
1

There are an array of multiplication procedures that can be used in a CPU. E.g., Booth multiplier for 2's complement binary numbers taught in most computer architecture/organization course. Multiplication in binary is simpler than multiplication in decimal representation. Calculating partial products is simple. The multiplicand,M,(if multiplier bit is 1) or 0(if a multiplier bit is 0). Compare that with decimal where it could be anything between (0*M to 9*M). Whenever somebody design a custom CPU (like soft-core on a FPGA) a developer can choose appropriate multiplication procedures. Some commonly used are CORDIC multipliers, Radix-2,Radix-4,Radix-8... booth multipliers. All this multiplier algorithm generate partial product (like the manual multiplication). All the partial products are added to have the final product. There are multiple ways this is done in modern processors like using Dadda multiplier, Wallace tree.

Put simply, every processor designer can use any multiplier algorithm to generate partial products and add partial products to get final results. Depending on binary representation used internally in the processor registers, register size in bits, the most optimum algorithm will vary.

ajit
  • 410
  • 3
  • 9
  • (I guess at least the above was well known to speeder by the time Paul A. Clayton answered (see [Paul R's comment](https://stackoverflow.com/questions/26370287/how-modern-x86-processors-actually-compute-multiplications/62228203#comment41397960_26370287)). And the real question is *What multipliers **are** general purpose processors using "currently"?* - a moving target, obviously. And Mr. Alsup's advice to check patent applications & (infringement) challenges as well as grants may be your best bet to find out.) – greybeard Jun 06 '20 at 07:10
  • 1
    I elaborated as speeder mentioned "I guess they don't rely on logarithm tables, and don't make loops with thousands of sums...". My answer explains that it will not be "thousands of sum" , only N or less partial products will be added for N bit operands/registers. If CORDIC implementation is used than LUT as suggested by Billy Bob would come into play. – ajit Jun 06 '20 at 09:24
  • A quick Google search found https://www.dsprelated.com/thread/3151/details-of-a-cordic-multiplier which links [A survey of CORDIC algorithms for FPGA based computers](http://www.andraka.com/files/crdcsrvy.pdf). +1, this answer does go into more detail than just saying everything ultimately uses Dadda trees. – Peter Cordes Jun 06 '20 at 11:59
-2

Modern processors have on-board math co-processors. I believe they contain LUTs (look up tables) for multiplication.

Billy_Bob
  • 85
  • 7
  • Can someone confirm, or not, what this guy just wrote? – speeder Oct 16 '14 at 04:28
  • 1
    @speeder no one could as you did not specify the CPU and operation... That is why no high rep user is answering ... as without specification is your question unanswerable – Spektre May 31 '20 at 08:52