3

Is there any example (e.g. on https://godbolt.org/ ) where CLang generates worse code when an algorithm expressed by pointer iterations instead of array indexes? E.g. it can vectorize/unfold in one case but can't in the other one?

In simple examples apparently it doesn't matter. Here is a pointer iteration style:

while (len-- > 0) {
  *dst++ = *src++;
}

Here is the logically same code in index style:

while (idx != len) {
  dst[idx] = src[idx];
  idx++;
}

Disregard any UB and/or off by one errors here.

Edit: the argument about indices being sugar is irrelevant, as desugraing doesn't change the algorithm style. So the following pointer based code is still in the index style:

while (idx != len) {
  *(dst + idx) = *(src + idx);
  idx++;
}

Note that the index-based loop has only 1 changing variable, while the pointer-based loop has 2, and the compiler must infer that they always change together.

You should look at this in the context of https://en.wikipedia.org/wiki/Induction_variable and https://en.wikipedia.org/wiki/Strength_reduction. Pointer style is essentially strength-reduced index-style, as addition is replaced by increments. And this reduction was beneficial for performance for some time, but no longer.

So my question boils down to if there are situations when this strength reduction cannot be performed or reversed by a compiler.

Another possible case is when indexes are not induction variables. So corresponding pointer code includes "arbitrary jumps" and it's somehow harder to transform the loop due to "history" of past iterations.

nponeccop
  • 13,527
  • 1
  • 44
  • 106
  • It depends. Your two examples are in principle identical (assuming `idx` starts at zero). If the compiler is smart enough, it will be able to transform one form into the other. Compilers are getting better all the time, so a newer version of Clang might be able to vectorize more. Maybe there is an example where a given version of Clang produces worse code in one form than in the other form, but the more interesting question is: in which cases wouldn't any compiler be able to vectorize one form, but would in the other form, and why? – G. Sliepen Dec 29 '19 at 21:38
  • GCC, MSVC and Clang all transform the first verson into the second one, with Clang also producing unrolled and vectorized code paths. As for why - one reason might be that pointers have contrived semantics due to optimization and portability needs. E.g. pointer subtraction can result in undefined behaviour, but index subtraction can't. There is also an urban legend that Fortran is easier to vectorize due to indexes, and so on. – nponeccop Dec 29 '19 at 22:42

1 Answers1

3

As long as no overloaded operator [] is involved, a subscript expression is literally defined to be identical to pointer arithmetic followed by dereferencing the result [expr.sub]/1. Thus, as long as both versions are indeed equivalent, compilers should generally be able to optimize both versions equally well (I'd probably go as far as considering a compiler's failure to optimize one but not the other a performance bug). That being said, note that there are lots of subtleties such as the wrap-around behavior of unsigned arithmetic that can make iterating over an index not exactly equivalent to iterating over a pointer…

Michael Kenzel
  • 15,508
  • 2
  • 30
  • 39