Are these cpu operations same?

Question

Consider the following two cpu operations:

sum = 1 + 1 + 1 + 1 + 1 + 1+ 1+......n times

and sum = 1 * n

Now I want to know whether these two operations are different in terms of complexity and cpu time. Also can I consider these as atomic operations?

I guess compiler will make his optimizations and they will take same time. — Chuck Norris, May 10 '12 at 06:27
Why is this tagged [atomic]? Are you asking about atomic RMW operations on `sum`? But no, you just have one assignment to `sum`, not n `++sum` operations. Atomicity in CPU architecture normally refers to operations on memory, because register state is thread-private but memory is shared between threads. — Peter Cordes, Oct 28 '22 at 18:53

score 1 · Answer 1 · answered May 10 '12 at 09:05

1

On an x86, ADD takes slightly less than one cycle on aggregate to perform, MUL takes roughly 2.7 cycles.

The optimiser in your compiler is pretty smart. If you're doing a multiplication that it can do faster using shifts and adds it will.

Write your code to say what you're doing in the simplest possible way and the compiler and processor will reward you by giving you fast code.

answered May 10 '12 at 09:05

SecurityMatt

6,593
1
22
28

`imul reg, reg, 1` has 1/clock throughput and 3 cycle latency on most modern x86 CPUs. (https://uops.info/ https://agner.org/optimize/) IDK where you got 2.7 from; maybe timing in reference cycles with `rdtsc`, with the core clock at a turbo frequency somewhat above the reference frequency? [CPU TSC fetch operation especially in multicore-multi-processor environment](https://stackoverflow.com/a/11060619) / [How to get the CPU cycle count in x86\_64 from C++?](https://stackoverflow.com/a/51907627) . Actual timing in core clock cycles can be done with performance counters, e.g. Linux `perf`. – Peter Cordes Oct 28 '22 at 18:58
Of course, `sum = n * 1` will optimize away to `sum = n`, which is free. – Peter Cordes Oct 28 '22 at 18:59
ADD has 1 cycle latency on basically all modern CPUs, but 4/clock throughput on modern x86. So if a compiler arranges things as `((1+1) + (1+1)) + ((1+1) + ...) + ....` it can expose enough instruction-level parallelism for the hardware to run multiple ADD operations per clock cycle. Obviously that's still much less efficient than just copying `n`, or even than multiplying by a constant other than `1` if you have more than 2 additions. – Peter Cordes Oct 28 '22 at 19:21

Florin Ghita · Answer 2 · 2012-05-10T06:39:50.207

0

ADD and MUL are different operations.

In the first statement you have N ADD operations -> many cicles.

In the second you have one operation -> few cycles.

However, it depends on compiler how those statements will be executed. (It is posible that compiler will replace the 1+1+1+...+1 whith N at compile time, so the executable will to a single operation. But the compiler will do N operations in this case.)

UPDATE: There exists MUL operations: http://en.wikipedia.org/wiki/X86_instruction_listings

edited May 10 '12 at 06:39

answered May 10 '12 at 06:29

Florin Ghita

17,525
6
57
76

But there is no multiplication operation in cpu, only addition and subtraction. – me_digvijay May 10 '12 at 06:33
There are multiplication instructions (in x86/x64), I believe typically there aren't division ones. – helloworld922 May 10 '12 at 06:35
But I think these multiplication operations cannot be atomic – me_digvijay May 10 '12 at 06:52

Are these cpu operations same?

2 Answers2