8

Suppose I write,

int a = 111;
int b = 509;
int c = a * b;

So what is the time complexity to compute 'a * b' ? How is the multiplication operation executed?

user2560730
  • 345
  • 4
  • 11
  • 3
    probably [this](http://en.wikipedia.org/wiki/Multiplication_algorithm) might help you ! – AllTooSir Jul 28 '13 at 14:44
  • Measured in terms of a and b. – user2560730 Jul 28 '13 at 14:45
  • @TheNewIdiot, the link doesn't mention what kind of multiplication process does a CPU architecture does while executing the '*' operator. Or maybe I missed it while speed reading. If you find it there, please post the link to the subsection. – user2560730 Jul 28 '13 at 14:52
  • 2
    It depends. What CPU? Typical CPUs have constant-time multiplication, however, some do not. – harold Jul 28 '13 at 14:55
  • My compiler compiles this code to something like `mov eax, 56499`. Does yours do something different? What about when optimizations are enabled? – Cody Gray - on strike Jul 28 '13 at 15:07
  • 3
    Multiplication of *fixed-size* numbers (such as `int` in most C++ implementations) is by definition constant-time operation. The input is always of the same size (same number of bits) so computation time as a function of input has a hard upper bound. Multiplication of arbitrary-precision numbers is another matter. – n. m. could be an AI Jul 28 '13 at 15:11
  • 2
    @n.m. everything is either constant time or non-terminating on a computer with fixed-size pointers, because there can only be O(2^(2^pointersize)) total states since you can only address a fixed amount of memory, so it must either loop infinitely or stop after a fixed number of steps. – harold Jul 28 '13 at 15:14
  • 1
    Multiplication is not an algorithm. Multiplication is a map, often computed using some algorithm. Which algorithm are you asking about? – William Pursell Jul 28 '13 at 15:14
  • @harold Who uses such computers these days? My computer is the internet, it grows. – n. m. could be an AI Jul 28 '13 at 15:25

5 Answers5

12

Compiling this function:

int f(int a, int b) {
    return a * b;
}

With gcc -O3 -march=native -m64 -fomit-frame-pointer -S gives me the following assembly:

f:
    movl    %ecx, %eax
    imull   %edx, %eax
    ret

The first instruction (movl) loads the first argument, the second instruction (imull) loads the second argument and multiplies it with the first - then the result gets returned.

The actual multiplication is done with imull, which - depending on your CPU type - will take a certain amount of CPU cycles.

If you look at Agner Fog's instruction timing tables you can see how much time each instruction will take. On most x86 processors it seems to be a small constant, however the imul instruction on the AMD K8 with a 64 bit argument and result shows as 4-5 CPU cycles. I don't know if that's a measurement issue or really variable time however.

Also note that there's other factors involved than just the execution time. The integer has to be moved through the processor and get into the right place to get multiplied. All of this and other factors make latency, which is also noted in Agner Fog's tables. There are other issues such as cache issues which also make life more difficult - it's not that easy to simply say how fast something will run without running it.


x86 isn't the only architecture, and it's actually not inconceivable there are CPU's and architectures out there that have non-constant time multiplication. This is especially important for cryptography where algorithms using multiplication might be susceptible to timing attacks on those platforms.

orlp
  • 112,504
  • 36
  • 218
  • 315
  • 12
    This still doesn't **necessarily** mean that the CPU will run `imul` under the same number of clock cycles. –  Jul 28 '13 at 14:46
  • @H2CO3 I was still busy writing :) – orlp Jul 28 '13 at 14:47
  • Even x86 doesn't necessarily have a fixed timing: 80386 (and 80486) took a very variable amount of time for multiplication, but I don't remember any actual details about what it depended on. – harold Jul 28 '13 at 15:03
  • Though, as the question is worded, it's still correct. Since the time needed for multiplication is proportional to the length of the numbers and there is an upper bound (word size), there exists a constant upper bound for complexity, too. In other words, it's an O(1) operation. Whether that means 4 or 5 cycles or whether it takes more time in an odd moon phase is irrelevant for complexity. – Damon Jul 28 '13 at 16:46
  • 4
    @Damon well, somehow I have the feeling that the OP wanted a the more practical answer rather than a pedantic (but of course correct) one. The time depends on the value in some way (the number of ones?) on 80386, that's a more useful fact than the trivial fact that *everything* is constant time (or non-terminating) on x86. – harold Jul 28 '13 at 18:30
  • @Damon When talking about a constant time instruction it usually means that it will always take the same amount of cycles, regardless of input - at least in cryptography. this is a different meaning of "constant time" than in algorithms theory. – orlp Jul 28 '13 at 18:45
  • @harold: Except the practical answer is equally pedantic and useless. It may take one cycle more or less to complete, fine. But instruction pipelining mostly makes this irrelevant, unless there is a long dependency chain. On the other hand, cache and TLB effects can easily make an instruction 50-80 times slower (including IMUL), regardless of the actual instruction's doing. This is true on many accounts for many instructions. Insofar, none of the answers is better or worse, or more correct or less so. – Damon Jul 28 '13 at 19:09
  • @Damon the variable timing applies to microarchs that did not have such newfangled considerations, and the difference isn't one or two cycles but about half an order of magnitude. – harold Jul 28 '13 at 19:21
  • 1
    @Damon: Useless? That's not for you to decide - for example, having a non-constant time instruction might bite you very hard in cryptography. In such a case a timing attack can be set up that depending on the exact circumstances might turn an unbreakable encryption algorithm into an algorithm of which the key can be retrieved within hours. – orlp Jul 28 '13 at 19:38
  • @Damon: And even outside of cryptography non-constant instructions might turn into huge performance issues: http://stackoverflow.com/questions/9314534/why-does-changing-0-1f-to-0-slow-down-performance-by-10x – orlp Jul 28 '13 at 19:42
3

Multiplication itself on most common architectures will be constant. Time to load registers may vary depending on the location of the variables (L1, L2, RAM, etc) but the number of cycles operation takes will be constant. This is in contrast to operations like sqrt that may require additional cycles to achieve certain precision.

you can get instruction costs here for AMD, Intel, VIA: http://www.agner.org/optimize/instruction_tables.pdf

Anycorn
  • 50,217
  • 42
  • 167
  • 261
1

By time complexity, I presume you mean whether it depends on the number of digits in a and b? So whether the number of CPU clock cycles would vary depending on whether you multiplied say 2*3 or 111*509. I think yes they would vary and it would depend on how that architecture implements the multiplication operation and how the intermediate results are stored. Although there can be many ways to do this one simple/primitive way is to implement multiplication using the binary adder/subtractor circuit. Multiplication of a*b is adding a to itself b times using n-digit binary adders. Similarly division a/b is subtraction b from a until it reaches 0, although this will take more space to store the quotient and remainder.

PKM
  • 329
  • 4
  • 17
-1
void myfun()
{
int a = 111;
int b = 509;
int c = a * b;
}

De assemble part :

movl    $111, -4(%ebp)
movl    $509, -8(%ebp)
movl    -4(%ebp), %eax
imull   -8(%ebp), %eax

So as you can see it all depends on imull instruction, specifically the fetch, decode, and execute cycle of a CPU.

P0W
  • 46,614
  • 9
  • 72
  • 119
-1

In your example, the compiler would do the multiplication and your code would look like

int c = 56499;

If you changed your example to look like

int c = a * 509;

then the compiler MIGHT decide to rewrite your code like

int c = a * ( 512 - 2 - 1 );
int c = (a << 9) - (a << 1) - a;

I said might because the compiler will compare the cost of using shift to the cost of a multiply instruction and pick the best option. Given a fast multiply instruction, that usually means only 1 or maybe 2 shift will be faster.

If your numbers are too large to fit in an integer (32-bits) then the arbitrary precision math routines use between O(n^2) and O(n log n) time where n is the number of 32-bit parts needed to hold the numbers.

brian beuning
  • 2,836
  • 18
  • 22
  • 1
    That information is a bit outdated. Modern CPU's generally perform a multiplication faster than those shift instructions. I think, I remember measuring full clock speed multiplications on my relatively weak AMD-CPU, and that was with 64 bits... – cmaster - reinstate monica Jul 28 '13 at 15:35