2

In this answer somebody writes

[..] most compilers won't optimize a + b + c + d to (a + b) + (c + d) (this is an optimization since the second expression can be pipelined better)

The original question was about how certain expressions involving float value can or can not be re-ordered due to the imprecision of floating point arithmetic.

I'm more interested in the above part though: why - say, with unsigned int values - would it be easier to generate code which exploits CPU pipelines if a+b+c+d is rewritten as (a+b)+(c+d)?

Community
  • 1
  • 1
Frerich Raabe
  • 90,689
  • 19
  • 115
  • 207

3 Answers3

3

a+b and c+d can be calculated in parallel.

Like this:

x = a+b
y = c+d
return x+y // requires x and y

vs

x = a+b
y = x+c // requires x
return y+d // requires y (and thus x)

When calculating y one has to wait for the result of x to come in first, there is a data dependency between them. See Instruction-level parallelism on Wikipedia.

johv
  • 4,424
  • 3
  • 26
  • 42
2

With unsigned int? It wouldn't. Integer operations can be reordered freely without any risk of affecting the result, so any half-decent compiler should generate the same code for both expressions, because they only mean something different when discussing floats.

Alex Celeste
  • 12,824
  • 10
  • 46
  • 89
  • Except that C specifies the associativity of the binary `+` operator to be left-to-right, so the compiler is _not_ free to reorder the operations. (See, e.g., [Operators in C and C++](http://en.wikipedia.org/wiki/Operators_in_C_and_C++).) The compiler must compile `a + b + c + d` as if it were written `((a + b) + c) + d` – Ted Hopp Nov 25 '13 at 22:17
  • 4
    That only restricts the compiler's freedom when there are potential side effects or when the operation is not fully associative (as in floating-point arithmetic). The compiler is always free to reorder things when it will not affect the outcome. – user57368 Nov 25 '13 at 22:21
  • 1
    Unsigned integer operations can be reordered freely (subject to the appropriate mathematics), because they use modulo arithmetic. Signed integer operations cannot be reordered so freely unless the C implementation makes guarantees beyond what the C standard requires (which it well might). E.g., with `int`, `a+(b+c)` might overflow when `a+b+c` would not. – Eric Postpischil Nov 25 '13 at 23:41
  • Unless you're programming for a DSP, you're not likely to have hardware that performs saturating integer arithmetic (except when explicitly requested), so even signed operations will still be completely associative. – user57368 Nov 26 '13 at 08:16
2

If your compiler generates an intermediate SSA, it might come out looking like:

AB = a + b;
ABC = AB + c;
ABCD = ABC + d;

in the first case, and:

AB = a + b;
CD = c + d;
ABCD = AB + CD;

In case 1, each term includes the previous term, so even if the ALU is capable of adding multiple terms at once, it has to wait for the result of the previous operation to start the next. In case two, a processor like a modern x86 with multiple ALU pipelines can calculate AB and CD independently simultaneously.

sqykly
  • 1,586
  • 10
  • 16
  • +1 - recognizing and exploiting this kind of parallelism makes the work done by people who work on the x86 code generation part of compilers even more impressive. – Frerich Raabe Nov 25 '13 at 22:25