GCC, std::array and for vs ranged-based for

Question

I'm trying to use (large-ish) arrays of structs containing only a single small std::array each. Especially the K == 1 case should be well supported (see code). However, GCC seems to be unable to properly optimize these arrays in most cases, especially when using ranged-based for. Clang produces code that I don't fully understand but seems to be well optimized (and uses SSE, which GCC doesn't).

#include <vector>
#include <array>

template<int K>
struct Foo {
    std::array<int, K> vals;

    int bar() const {
        int sum = 0;
#if 0 // Foo Version A
        for (auto f : vals)
            sum += f;
#else // Foo Version B
        for (auto i = 0; i < K; ++i)
            sum += vals[i];
#endif
        return sum;
    }
};

int test1(std::vector<Foo<1>> const& foos)
{
    int sum = 0;
    for (auto const& f : foos)
        sum += f.bar();
    return sum;
}

// Version C
int test2(std::vector<std::array<int, 1>> const& foos)
{
    int sum = 0;
    for (auto const& f : foos)
        for (auto const& v : f)
            sum += v;
    return sum;
}

// Version D
int test3(std::vector<std::array<int, 1>> const& foos)
{
    int sum = 0;
    for (auto const& f : foos)
        for (auto i = 0; i < f.size(); ++i)
            sum += f[i];
    return sum;
}

Godbolt Code, gcc 7.2, flags -O2 -std=c++11 -march=native. Older gcc versions behave similarly.

If I'm not mistaken, then all four versions should have the same semantic.

Furthermore, I would expect all versions to compile to about the same assembly. The assembly should only have one frequently used conditional jump (for iterating over the vector).

However, the following happens:

Version A (ranged-based for, array-in-struct): 3 conditional jumps, one at the beginning for handling zero-length vectors. Then one for the vector (this is ok). But then another for iterating over the array? Why? It has constant size 1.
Version B (manual for, array-in-struct): Here, GCC actually recognizes that the array of length 1 can be optimized, the assembly looks good.
Version C (ranged-based for, direct array): The loop over the array is not optimized away, so again two conditional jumps for the looping. Also: this version contains more memory access that I thought would be required.
Version D (manual for, direct array): This one is the only version that looks sane to me. 11 instructions.

Clang creates way more assembly code (same flags) but it's pretty much the same for all versions and contains a lot of loop unrolling and SSE.

Is this a GCC-related problem? Should I file a bug? Is this something in my C++ code that I should/can fix?

EDIT: updated the godbolt url due to a fix in Version B. Behavior is now the same as Version D, which makes this a pure manual-for vs. ranged-for issue.

I think it's entirely a quality-of-implementation issue. This particular issue seems to have been fixed in GCC 8. — cpplearner, Sep 05 '17 at 11:01
Optimisation is generally related to quality of implementation, rather than something compilers are obligated to do. Failure to optimise to the extent you expect wouldn't normally be considered a bug. Also `-O2` typically only implements optimisations that don't involve a space/speed trade-off. If you are expecting higher levels of optimisation, use `-O3`. — Peter, Sep 05 '17 at 11:03
@Peter I expect that all versions basically generate the code of Version D which is imho not a space/speed trade-off. I would consider this a "missing optimization" kind of bug, not a bug kind of bug. Even though they aren't obligated to do so, it would be nice if manual for and ranged-based for would give the same (optimized) results in case of fixed-length arrays. — Artificial Mind, Sep 05 '17 at 11:07
As a QoI issue, perhaps iterating over a size 1 container isn't at the top of the list when adding optimizations for range-based for-statements? If I were a compiler implementer I would go for feature complete first, and worry about corner cases later. — Bo Persson, Sep 05 '17 at 11:16
To address the use of SSE-instructions: for a program I am working on, I need to manually add the corresponding switches, `-march=native` isn't enough. — camelCase, Sep 05 '17 at 11:23
@boper looping over arrays of arrays, where inner array is small, and unrolling inner array is a pretty top of list optimization. — Yakk - Adam Nevraumont, Sep 05 '17 at 11:24
This is probably caused by the same thing causing https://stackoverflow.com/questions/45496987/gcc-optimizes-fixed-range-based-for-loop-as-if-it-had-longer-variable-length — T.C., Sep 05 '17 at 11:25
@ArtificialMind - The reason such things are considered quality of implementation concerns is that different compilers are permitted to have different qualities. So, for a given task, one compiler might be better than another. The point is that you choose a compiler which meets your expectations, not that you deem a compiler to be buggy if it fails to meet your expectations. Because a compiler that doesn't meet your expectations may well meet the expectations of someone else who considers different qualities are more important. — Peter, Sep 05 '17 at 13:30
@peter Crappy optimization is a bug. Poor quality of implementation is a bug. Matching the C++ standard is not enough; by the C++ standard, a compiler can turn `1+2` into a for loop that first attempts to solve the collatz conjecture over 64 bit values without violating one iota of the standard. I have higher standards for real, practical C++ compilers than "doesn't violate the standard". — Yakk - Adam Nevraumont, Sep 05 '17 at 13:50
@Yakk - too bad compiler vendors don't agree with you, and they compete by offering products with different qualities, including what optimisations they do. Some basic optimisations (like turning `1+2` into `3`) are pretty common across compilers. But optimisations of loop constructs and handling of edge cases (like loops working with arrays that have one elements) vary between compilers. — Peter, Sep 05 '17 at 14:01
@Peter Given that John Zwinck [filed a bug report](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81719) that was then fixed, I'd say the compiler vendors do agree with Yakk. — Barry, Sep 06 '17 at 15:54

GCC, std::array and for vs ranged-based for

0 Answers0