Operator chaining, why grow left branch?

Question

According to this question (that is, OP stated the belief and was not corrected) - chained operators grow to the left:

a << b << c << d;
// ==
operator<<( operator<<( operator<<(a, b), c ), d);

Why is this the case? Would it not be more efficient to do:

operator<<( operator<<(a, b), operator<<(c, d) );

i.e., to balance as much as possible?

Surely the compiler could figure this out for better runtime performance?

@Mysticial Surely it would only ever be in the position of `a` above? And if anything was going to prevent 'balancing', wouldn't the compiler always know this in advance? — OJFord, Jan 24 '15 at 01:07
Take the example of `cout << 1 << 2 << 3`. Under your proposal, that would mean the same as `(cout << 1) << (2 << 3)`, which is *not* what's generally expected or desired. — Josh, Jan 24 '15 at 01:12
@Josh Agh - I see it now. For a longer chain though, could it not be worth constructing a stream to build next to the 'direct to cout' chain, and then `<<` them? e.g. `(cout << 1 << 2) << (created << 3 << 4)` — OJFord, Jan 24 '15 at 01:15
@OllieFord: Consider: `32 / 2 / 2 / 2`. The right parsing is `((32/2)/2)/2 = 4`. With your "optimization" you've changed it to `(32/2)/(2/2)=16`. — Mooing Duck, Jan 24 '15 at 01:20
@OllieFord Note that I/O is one of the slowest things for the CPU to process so almost without exception the the entire string will be accumulated *before* it is delivered to I/O. — Jonathan Mee, Jan 24 '15 at 01:21
@MooingDuck Okay, I should have been more specific than just 'operator'! That issue doesn't exist for, say, `32*2*2*2` though? — OJFord, Jan 24 '15 at 01:26
@OllieFord: The compiler _treats it as if_ it was left to right, but the code it generates almost always _does_ do the optimization you mention, _when it's safe_. (Note, it's usually not safe for floating point) — Mooing Duck, Jan 24 '15 at 01:30

score 3 · Answer 1 · edited May 23 '17 at 12:20

When you overload an operator in C++, it retains the same precedence and associativity that operator has when it's not overloaded.

In the case of shift operators (<< and >>) the standard requires that:

The shift operators << and >> group left-to-right.

That means an operation like a<<b<<c must be parsed as (a<<b)<<c.

Overloading the operator changes the code that's invoked for each of those operations, but has no effect on grouping--the grouping will be the same whether a, b and c are of built-in types (e.g., int) for which the compiler-provided operator will be used, or whether they're some class type for which an overloaded operator will be used. The parser that handles the grouping remains identical either way.

Note, however, that order of evaluation is independent of precedence or associativity, so this does not necessarily affect execution speed.

score 1 · Accepted Answer · edited May 23 '17 at 11:49

Figuring this out for "better runtime performance" is better served by turning turning operations into CISC. (MULADDs for example.)
The synchronization cost between two processors would far outweigh the benefits for any reasonable number of operators that were crammed into one line.
This would render out-of-order processing impossible effectively making each core operate at the speed of a Pentium Pro.
Aside from the std::cout mentioned by Mystical, think of the following integer arithmetic problem: 3 * 5 / 2 * 4 that should result in (15 / 2) * 4, meaning 28, but if split out it would result in 15 / 8, meaning 1. This is true even though the operators have the same significance.

EDIT:

It's easy to think that we could get away with splitting commutative operations across cores, for example 3 * 5 * 2 * 4.

Now lets consider that the only way to communicate between cores is through shared memory. And that the average load time is an order of magnitude slower than the average CPU operation time. That alone means that a tremendous number of mathematical operations would need to be done before even accommodating 1 load would make sense. But even worse:

The master core must write the data and operations to be performed to memory. Because these must be separate there could be a page fault on write.
After the master core has finished it's work it must load a dirty bit to check if the slave core has finished processing. It could be busy or have been preempted so this could require multiple loads.
The result from the slave core must then be loaded.

You can see how bad that would get given the expense of loads. Lets look at optimizations that can be done if everything is kept on one core:

If some values are known ahead of time they can be factored into 2s and shifts can be done instead. Even better a shift and add potentially performing more operations than 1 at a time.
For 32-bit or smaller numbers the upper and lower bits of a 64-bit processor may be used for addition.
Out of order processing may be done at the CPU's discretion, for example floating point operations are slower than integer operations, so in the situation: 1.2 * 3 * 4 * 5 The CPU will do the 3 * 4 * 5 first and do only 1 floating point operation.

Thanks for the additional points. I really only meant for like operators, as that obviously causes a problem with arithmetic, but I realise now I didn't specify. What about `(3*5) * (2*4)` then, since that clearly doesn't have the `cout` issue, and if they were instead variables it could quite realistically be a long multiplication chain? — OJFord, Jan 24 '15 at 01:24
@OllieFord OK there's probably more Computer Architecture class than you requested. Enjoy! — Jonathan Mee, Jan 24 '15 at 01:59
That last point there about OoO is not how OoO works, the compiler can maybe do it though. If the compiler emits floating point instructions, they will really happen. And they can't be reordered using associativity rules. The only reordering that a OoO processor can do is executing instructions in some valid data-flow order, ie executing instructions some time after their operands become available. — harold, Jan 24 '15 at 18:31
@harold Ugh, just looked over what I said and you are totally right. Well put sir. — Jonathan Mee, Jan 24 '15 at 18:36

Operator chaining, why grow left branch?

2 Answers2