An example of an optimization that involves compiler reordering

Question

C & C++ compilers are allowed to reorder operations as long as the as-if rule holds. What is an example of such a reordering performed by a compiler, and what is the potential performance gain to be had by doing it?

Examples involving any (C/C++) compiler on any platform are welcome.

Have you seen http://preshing.com/20120625/memory-ordering-at-compile-time/? — NPE, Dec 23 '14 at 06:07
@NPE I demonstrated the first example working using godbolt [here](http://stackoverflow.com/a/25847445/1708801) — Shafik Yaghmour, Dec 23 '14 at 10:47
I did see that article, among others, but I couldn't find an explanation as to why the compiler reasoned that such an ordering could potentially improve performance. — TripShock, Dec 23 '14 at 20:08

Pradhan · Accepted Answer · 2014-12-23T06:14:27.910

Suppose you have the following operations being performed:

int i=0,j=0;
i++;
i++;
i++;
j++;
j++;
j++;

Ignoring for the moment that the three increments would likely be optimized away by the compiler into one +=3, you will end up having a higher processor-pipeline throughput if you reordered the operations as

i++;
j++;
i++;
j++;
i++;
j++;

since j++ doesn't have to wait for the result of i++ while in the previous case, most of the instructions had a data dependency on the previous instruction. In more complicated computations, where there isn't an easy way to reducing the number of instructions to be performed, the compiler can still look at data dependencies and reorder instructions so that an instruction depending on the result of an earlier instruction is as far away from it as possible.

Another example of such an optimization is when you are dealing with pure functions. Looking at a simple example again, assume you have a pure function f(int x) which you are summing over a loop.

int tot = 0;
int x;//something known only at runtime
for(int i = 0; i < 100; i++)
  tot += f(x);

Since f is a pure function, the compiler can reorder calls to it as it pleases. In particular, it can transform this loop to

int tot = 0;
int x;//something known only at runtime
int fval = f(x);
for(int i = 0; i < 100; i++)
  tot += fval;

Your first example points out reordering helps pipeline throughput. Tho, I thought such optimization can be done by CPU reordering too. If so, why does compiler bother to reorder? Thanks in advance. — HCSF, Nov 17 '21 at 00:53
@Pradhan Thanks for the answer, just wanted to mention that a pure function is a function that always return the same result for the same argument values — Guy Avraham, May 08 '22 at 04:55

score 5 · Answer 2 · answered Dec 23 '14 at 06:06

I'm sure there are quite a few examples where reordering operations will yield faster performance. An obvious example would be to reorder loads as early as possible, since these are typically much slower than other CPU operations. By doing other, unrelated work whilst the memory is being fetched, the CPU can save time overall.

That is, given something like this:

expensive_calculation();
x = load();
do_something(x);

We can reorder it like this:

x = load();
expensive_calculation();
do_something(x);

So while we're waiting for the load to complete, we can essentially do expensive_calculation() for free.

score 4 · Answer 3 · answered Dec 23 '14 at 06:11

Suppose you have a loop like:

for (i=0; i<n; i++) dest[i] = src[i];

Think memcpy. You might want the compiler to be able to vectorize this, i.e. load 8 or 16 bytes at a time and then store 8 or 16 at a time. Making that transformation is a reordering, since it would cause src[1] to be read before dest[0] is stored. Moreover, unless the compiler knows that src and dest don't overlap, it's an invalid transformation, i.e. one the compiler is not allowed to make. Use of the restrict keyword (C99 and later) allows you to tell the compiler that they don't overlap so that this kind of (extremely valuable) optimization is possible.

The same sort of thing arises all the time in operations on arrays that aren't just copying - things like vector/matrix operations, transformations of sound/image sample data, etc.

An example of an optimization that involves compiler reordering

3 Answers3