0

Consider the following two cpu operations:

sum = 1 + 1 + 1 + 1 + 1 + 1+ 1+......n times

and sum = 1 * n

Now I want to know whether these two operations are different in terms of complexity and cpu time. Also can I consider these as atomic operations?

Daniel
  • 2,869
  • 4
  • 26
  • 28
me_digvijay
  • 5,374
  • 9
  • 46
  • 83
  • I guess compiler will make his optimizations and they will take same time. – Chuck Norris May 10 '12 at 06:27
  • 1
    Why is this tagged [atomic]? Are you asking about atomic RMW operations on `sum`? But no, you just have one assignment to `sum`, not n `++sum` operations. Atomicity in CPU architecture normally refers to operations on memory, because register state is thread-private but memory is shared between threads. – Peter Cordes Oct 28 '22 at 18:53

2 Answers2

1

On an x86, ADD takes slightly less than one cycle on aggregate to perform, MUL takes roughly 2.7 cycles.

The optimiser in your compiler is pretty smart. If you're doing a multiplication that it can do faster using shifts and adds it will.

Write your code to say what you're doing in the simplest possible way and the compiler and processor will reward you by giving you fast code.

SecurityMatt
  • 6,593
  • 1
  • 22
  • 28
  • `imul reg, reg, 1` has 1/clock throughput and 3 cycle latency on most modern x86 CPUs. (https://uops.info/ https://agner.org/optimize/) IDK where you got 2.7 from; maybe timing in reference cycles with `rdtsc`, with the core clock at a turbo frequency somewhat above the reference frequency? [CPU TSC fetch operation especially in multicore-multi-processor environment](https://stackoverflow.com/a/11060619) / [How to get the CPU cycle count in x86\_64 from C++?](https://stackoverflow.com/a/51907627) . Actual timing in core clock cycles can be done with performance counters, e.g. Linux `perf`. – Peter Cordes Oct 28 '22 at 18:58
  • Of course, `sum = n * 1` will optimize away to `sum = n`, which is free. – Peter Cordes Oct 28 '22 at 18:59
  • ADD has 1 cycle latency on basically all modern CPUs, but 4/clock throughput on modern x86. So if a compiler arranges things as `((1+1) + (1+1)) + ((1+1) + ...) + ....` it can expose enough instruction-level parallelism for the hardware to run multiple ADD operations per clock cycle. Obviously that's still much less efficient than just copying `n`, or even than multiplying by a constant other than `1` if you have more than 2 additions. – Peter Cordes Oct 28 '22 at 19:21
0

ADD and MUL are different operations.

In the first statement you have N ADD operations -> many cicles.

In the second you have one operation -> few cycles.

However, it depends on compiler how those statements will be executed. (It is posible that compiler will replace the 1+1+1+...+1 whith N at compile time, so the executable will to a single operation. But the compiler will do N operations in this case.)

UPDATE: There exists MUL operations: http://en.wikipedia.org/wiki/X86_instruction_listings

Florin Ghita
  • 17,525
  • 6
  • 57
  • 76