Performance penalty of using boost::irange over raw loop

Question

A few answers and discussions and even the source code of boost::irange mention that there should be a performance penalty to using these ranges over raw for loops.

However, for example for the following code

#include <boost/range/irange.hpp>

int sum(int* v, int n) {
    int result{};
    for (auto i : boost::irange(0, n)) {
        result += v[i];
    }
    return result;
}

int sum2(int* v, int n) {
    int result{};
    for (int i = 0; i < n; ++i) {
        result += v[i];
    }
    return result;
}

I see no differences in the generated (-O3 optimized) code (Compiler Explorer). Does anyone see an example where using such an integer range could lead to worse code generation in modern compilers?

EDIT: Clearly, debug performance might be impacted, but that's not my aim here. Concerning the strided (step size > 1) example, I think it might be possible to modify the irange code to more closely match the code of a strided raw for-loop.

Presumably the note in `irange.hpp` was written long ago, or refers to compilers other than those you tried. — Caleth, Nov 14 '19 at 11:06

user1810087 · Accepted Answer · 2020-01-10T11:10:13.567

Does anyone see an example where using such an integer range could lead to worse code generation in modern compilers?

Yes. It is not stated that your particular case is affected. But changing the step to anything else than 1:

#include <boost/range/irange.hpp>

int sum(int* v, int n) {
    int result{};
    for (auto i : boost::irange(0, n, 8)) {
        result += v[i];           //^^^ different steps
    }
    return result;
}

int sum2(int* v, int n) {
    int result{};
    for (int i = 0; i < n; i+=8) {
        result += v[i];   //^^^ different steps
    }
    return result;
}

Live.
While sum now looks worse (the loop did not get unrolled) sum2 still benefits from loop unrolling and SIMD optimization.

Edit:

To comment on your edit, it's true that it might be possible to modify the irange code to more closely. But:
To fit how range-based for loops are expanded, boost::irange(0, n, 8) must create some sort of temporary, implementing begin/end iterators and a prefix operator++ (which is crearly not as trivial as an int += operation). Compilers are using pattern matching for optimization, which is trimmed to work with standard C++ and standard libraries. Thus, any result from irange; if it is slightly different than a pattern the compiler knows to optimize, optimization won't kick in. And I think, these are the reason why the author of the library mentions performance penalties.

Performance penalty of using boost::irange over raw loop

1 Answers1

Edit: