Effective time complexity for n bit addition and multiplication

Question

I have done a course on Computer Architecture and it was mentioned that on the most efficient processors with n bit architecture word size the addition/subtraction of two words has a time complexity of O(log n) while multiplication/division has a time complexity of O(n).

If you do not consider any particular architecture word size the best time complexity of addition/subtraction is O(n) (https://www.academia.edu/42811225/Fast_Arithmetic_Speeding_up_Multiplication_Division_and_Addition_of_n_Bit_Numbers) and multiplication/division seems to be O(n log n log log n) (Strassen https://en.m.wikipedia.org/wiki/Multiplication_algorithm).

Is this correct?

Peter Cordes · Accepted Answer · 2022-06-21T22:44:16.373

2

O(log n) is the latency of addition if you can use n-bit wide parallel hardware with stuff like carry-select or carry-lookahead.

O(n) is the total amount of work that needs doing, and thus the time complexity with a fixed-width ALU for arbitrary bigint problems as n tends towards infinity.

For a multiply, there are n partial products in an n-bit multiply, so adding them all (with a Dadda tree for example) takes on the order of O(log n) gate delays of latency. Integer addition is associative, so you can do that in parallel, e.g. (a+b) + (c+d) is 3 with the latency of 2, and it gets better from there.

Dadda trees can avoid some of the carry-propagation latency so I guess it avoids the extra factor of log n you'd get if you you just used normal addition of each partial product separately.

See Differences between Wallace Tree and Dadda Multipliers for more about practical considerations for huge Dadda trees.

edited Jun 21 '22 at 22:44

answered Jun 21 '22 at 17:33

Peter Cordes

328,167
45
605
847

I read https://stackoverflow.com/questions/54374925/differences-between-wallace-tree-and-dadda-multipliers and now I am confused if with `n` bit architecture word size the multiplication/division of two words is possible with a time complexity of `O(log n)`. Is there any such actual hardware achieving this? – zeitgeist Jun 21 '22 at 18:39
1

@zeitgeist: Oh right, I was half asleep when writing that earlier. O(log n) sound right, or possibly `O( (log n)^2 )`. As far as real hardware, no real CPU has a bit-width anywhere near infinity. But modern high-performance x86 CPUs typically have a 64-bit multiply latency of 3 cycles, vs. an integer add latency of 1 cycle. I don't know how many gate delays that is. I haven't checked the literature. I find Alain's suggestion surprising, that modern architectures don't care much about latency; CPUs certainly do, especially for scalar integer. They do also care about throughput, of course. – Peter Cordes Jun 21 '22 at 22:48

Effective time complexity for n bit addition and multiplication

1 Answers1