8

I'm trying to create a local array of some POD values (e.g. double) with fixed max_size that is known at compile time, then read a runtime size value (size <= max_size) and process first size elements from that array.

The question is, why doesn't compiler eliminate stack reads and writes when arr and size are placed into the same struct/class, as opposed to the case where arr and size are independent local variables?

Here's my code:

#include <cstddef>
constexpr std::size_t max_size = 64;

extern void process_value(double& ref_value);

void test_distinct_array_and_size(std::size_t size)
{
    double arr[max_size];
    std::size_t arr_size = size;

    for (std::size_t i = 0; i < arr_size; ++i)
        process_value(arr[i]);
}

void test_array_and_size_in_local_struct(std::size_t size)
{
    struct
    {
        double arr[max_size];
        std::size_t size;
    } array_wrapper;
    array_wrapper.size = size;

    for (std::size_t i = 0; i < array_wrapper.size; ++i)
        process_value(array_wrapper.arr[i]);
}

Assembly output for test_distinct_array_and_size from Clang with -O3:

test_distinct_array_and_size(unsigned long): # @test_distinct_array_and_size(unsigned long)
  push r14
  push rbx
  sub rsp, 520
  mov r14, rdi
  test r14, r14
  je .LBB0_3
  mov rbx, rsp
.LBB0_2: # =>This Inner Loop Header: Depth=1
  mov rdi, rbx
  call process_value(double&)
  add rbx, 8
  dec r14
  jne .LBB0_2
.LBB0_3:
  add rsp, 520
  pop rbx
  pop r14
  ret

Assembly output for test_array_and_size_in_local_struct:

test_array_and_size_in_local_struct(unsigned long): # @test_array_and_size_in_local_struct(unsigned long)
  push r14
  push rbx
  sub rsp, 520
  mov qword ptr [rsp + 512], rdi
  test rdi, rdi
  je .LBB1_3
  mov r14, rsp
  xor ebx, ebx
.LBB1_2: # =>This Inner Loop Header: Depth=1
  mov rdi, r14
  call process_value(double&)
  inc rbx
  add r14, 8
  cmp rbx, qword ptr [rsp + 512]
  jb .LBB1_2
.LBB1_3:
  add rsp, 520
  pop rbx
  pop r14
  ret

Latest GCC and MSVC compilers do basically the same thing with the stack reads and writes.

As we can see, reads and writes to the array_wrapper.size variable on the stack are not optimized away in the latter case. There is a write of size value into location [rsp + 512] before the beginning of the loop, and a read from that location after each iteration.

So, the compiler kinda expects that we'd want to modify array_wrapper.size from the process_value(array_wrapper.arr[i]) call (by taking the address of the current array element and applying some weird offsets to it?)

But, if we tried to do so from that call, woudn't that be undefined behavior?

When we rewrite the loop in the following manner

for (std::size_t i = 0, sz = array_wrapper.size; i < sz; ++i)
    process_value(array_wrapper.arr[i]);

, those unnecessary reads at the end of each iteration will be gone. But the initial write to [rsp + 512] will remain, meaning that compiler still expects us to be able to access the array_wrapper.size variable in that location from these process_value calls (by doing some weird offset-based magic).

Why?

Is that but a little shortcoming in modern compilers' implementations (that hopefully will be fixed soon)? Or does the C++ standard indeed require such behavior that leads to generation of less efficient code whenever we put an array and its size into the same class?

P.S.

I realize that my code example above might seem a bit contrived. But consider this: I'd like to use a lightweight boost::container::static_vector-like class template in my code for safer and more convenient "C++-style" manipulations with pseudo-dynamic arrays of POD elements. So my PODVector will contain an array and a size_t in the same class:

template<typename T, std::size_t MaxSize>
class PODVector
{
    static_assert(std::is_pod<T>::value, "T must be a POD type");

private:
    T _data[MaxSize];
    std::size_t _size = 0;

public:
    using iterator = T *;

public:
    static constexpr std::size_t capacity() noexcept
    {
        return MaxSize;
    }

    constexpr PODVector() noexcept = default;

    explicit constexpr PODVector(std::size_t initial_size)
        : _size(initial_size)
    {
        assert(initial_size <= capacity());
    }

    constexpr std::size_t size() const noexcept
    {
        return _size;
    }

    constexpr void resize(std::size_t new_size)
    {
        assert(new_size <= capacity());
        _size = new_size;
    }

    constexpr iterator begin() noexcept
    {
        return _data;
    }

    constexpr iterator end() noexcept
    {
        return _data + _size;
    }

    constexpr T & operator[](std::size_t position)
    {
        assert(position < _size);
        return _data[position];
    }
};

Usage:

void test_pod_vector(std::size_t size)
{
    PODVector<double, max_size> arr(size);

    for (double& val : arr)
        process_value(val);
}

If the issue described above is indeed forced by C++'s standard (and is not the fault of compiler writers), such PODVector will never be as efficient as raw usage of an array and an "unrelated" variable for size. And this would be quite bad for C++ as a language which wants zero-overhead abstractions.

Taras
  • 488
  • 3
  • 15
  • It's definitely not the C++ standard as the latter allows practically everything by the as-if rule. – bipll Jan 17 '18 at 14:18
  • they're not the same at all - one initialises the value, and then never changes it (even though it's not const; the other allocates the space, assigns a value (which wouldn't be doable if it were const). If you give your compiler a chance and tell it they're const I'm sure it'll optimise it away for you. – UKMonkey Jan 17 '18 at 14:24
  • @bipll If so, then it is quite interesting why none of these three popular compilers (clang, gcc, msvc) actually apply the as-if rule in the described case in 2018. – Taras Jan 17 '18 at 14:24
  • 3
    [Godbolt link](https://godbolt.org/g/DDn9qm) to play with. – nwp Jan 17 '18 at 14:25
  • @FrançoisAndrieux I tried many versions of clang (from 3.8 to 5.0), many versions of gcc (from 4.9.2 to 7.2), with both -O2 and -O3 flags. As well as MSVC 19 2017 with /Ox. – Taras Jan 17 '18 at 14:30
  • @UKMonkey I'm sorry, I didn't understand your explanation :( For me, both functions in my original listing look like they would produce the same assembly. The only difference is that in the second version, the array and its size are wrapped in a single struct. I don't use any const qualifiers in the first version, but it gets optimized anyway. – Taras Jan 17 '18 at 14:35
  • @UKMonkey [Even if size is declared const](https://godbolt.org/g/FxBGoe) inside the class, the compiler does not optimize the unusefull away. So this is a weakness common to all compilers. – Oliv Jan 17 '18 at 14:37
  • @UKMonkey `const` does not help because the compiler assumes the called function does `const_cast`. If the definition of the function is available in the same translation unit the compiler again does not need `const` because it can see what the function actually does. – Maxim Egorushkin Jan 17 '18 at 15:18

1 Answers1

6

This is because void process_value(double& ref_value); accepts the argument by reference. The compiler/optimizer assumes aliasing, i.e. that process_value function can change memory accessible through reference ref_value and hence that size member after the array.

The compiler assumes that because the array and size are members of one same object array_wrapper function process_value can potentially cast the reference to the first element (on the first invocation) to the reference to the object (and store it elsewhere) and cast the object to unsigned char and read or replace its entire representation. So that after the function returns the state of the object must be reloaded from memory.

When size is a stand-alone object on the stack the compiler/optimizer assumes that nothing else could possibly have a reference/pointer to it and caches it in a register.

In Chandler Carruth: Optimizing the Emergent Structures of C++ he explains why the optimizers have difficulty when calling functions accepting reference/pointer arguments. Use reference/pointer function arguments only when absolutely necessary.

If you would like to change the value the more performant option is:

double process_value(double value);

And then:

array_wrapper.arr[i] = process_value(array_wrapper.arr[i]);

This change results in optimal assembly:

.L23:
movsd xmm0, QWORD PTR [rbx]
add rbx, 8
call process_value2(double)
movsd QWORD PTR [rbx-8], xmm0
cmp rbx, rbp
jne .L23

Or:

for(double& val : arr)
    val = process_value(val);
Maxim Egorushkin
  • 131,725
  • 17
  • 180
  • 271
  • 1
    That unfortunately drastically limits the amount of processing `process_value` can do. – nwp Jan 17 '18 at 14:45
  • At least in c++17 you cannot get access to size from a reference to the element of the array member. I suppose compiler do not apply C++ aliasing rule but C ones – Oliv Jan 17 '18 at 14:46
  • @nwp I do not think so. – Maxim Egorushkin Jan 17 '18 at 14:47
  • 1
    @Oliv The optimizer disagrees with you. – Maxim Egorushkin Jan 17 '18 at 14:52
  • @MaximEgorushkin In c++17 i am sure of me. I don't care if it is easier to perform optimization on a intermediate language where most of the semantic is lost. – Oliv Jan 17 '18 at 14:55
  • @Oliv That link nwp provided in the comments compiles it in C++17 mode and proves you wrong. – Maxim Egorushkin Jan 17 '18 at 14:56
  • @Oliv Added a longer explanation. – Maxim Egorushkin Jan 17 '18 at 15:08
  • @MaximEgorushkin The modification made to the standard in C++17 aim exactly at avoiding those optimization barrier. Whatsoever, think of the case where `size` member is declared const ([Compiler Explorer link](https://godbolt.org/g/4W9pS6)). The optimizer still perform a load, while it is explicitly stated that changing the value of an object declared const is UB. Compiler just fails because their over simplifly the code before performing optimization. – Oliv Jan 17 '18 at 16:08
  • @Oliv Could you provide a link to the modification you refer to? – Maxim Egorushkin Jan 17 '18 at 16:16
  • @Oliv In your example you have a `const` member/sub-object of a non-const object. – Maxim Egorushkin Jan 17 '18 at 16:16
  • 1
    [\[dcl.type.cv\]/4](https://timsong-cpp.github.io/cppwp/n4659/dcl.type.cv#4) "any attempt to modify a const object during its lifetime results in undefined behavior." Notice also that a suboject is an object. There are no special rule for const subobject in the standard!! But I think you already knew that, no? So how would you qualify your last comment? – Oliv Jan 17 '18 at 16:29
  • 2
    @Oliv Since complete object `array_wrapper` is originally created as non-const, it is well defined to cast it back and forth to const and non-const. And casting an object to `unsigned char` and replacing the object representation is also well defined. Is it not? – Maxim Egorushkin Jan 17 '18 at 16:34
  • Come on! "**any attempt** to modify..." it can not be clearer. Why do you place so much confidence on compilers? Don't you have never seen them fail? – Oliv Jan 17 '18 at 16:37
  • @Oliv Look at [dcl.type.cv.4](https://timsong-cpp.github.io/cppwp/n4659/dcl.type.cv#4), `ip = const_cast(cip); // cast needed to convert const int* to int*` and then `*ip = 4; // defined: *ip points to i, a non-const object`. – Maxim Egorushkin Jan 17 '18 at 16:38
  • And what happen when on the next line `*iq = 4; // undefined: modifies a const object`. Come on, you are even forbiding your self to read the next line. – Oliv Jan 17 '18 at 16:40
  • 1
    @Oliv On that line it modifies the originally const object. Please read what your cite carefully. – Maxim Egorushkin Jan 17 '18 at 16:41
  • So you modify your example, this time you forget to read the first line where cip is defined : `int i; const int* cip; cip=&i`. cip is a const pointer to a *non-const object* – Oliv Jan 17 '18 at 16:42
  • Incredible it is written `// undefined: modifies a const object`. **undefined** don't you realy see it??? – Oliv Jan 17 '18 at 16:43
  • @Oliv In the example in question `array_wrapper` is originally a non-const object. And the compiler assumes that `array_wrapper` is modified. Post a compiler bug report if you disagree. – Maxim Egorushkin Jan 17 '18 at 16:46
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/163342/discussion-between-oliv-and-maxim-egorushkin). – Oliv Jan 17 '18 at 16:46
  • @Oliv That's interesting. Is a const-qualified filed of a non-const class object considered a "true constant which must never be modified" from the standard's point of view? – Taras Jan 17 '18 at 16:50
  • @Taras honestly, I don't understand your sentence, could you rephrase it? – Oliv Jan 17 '18 at 16:53
  • You asked me about the change in the C++ standard, here a [link](https://stackoverflow.com/questions/48062346/is-a-pointer-with-the-right-address-and-type-still-always-a-valid-pointer-since). Whatsoever, size was not accessible from an element of the array without implementation defined behavior in C++14 and C++11 too: simple reason , pointer arithmetic restrictions. – Oliv Jan 17 '18 at 16:56
  • @Taras Look at [intro.object/2](https://timsong-cpp.github.io/cppwp/n4659/intro.object#2). – Maxim Egorushkin Jan 17 '18 at 16:57
  • @Oliv struct Object { const int x; }; For instance, I have a non-const Object object { 42 };, and then I do something like this: const_cast(object.x) = 100; The question is: is this line considered UB or not? It might be, because I attempted to modify a value initially declared as const. But on the other hand, it is not a stand-alone constant, but a member of another object - which itself is not const. – Taras Jan 17 '18 at 17:00
  • @Taras this is UB, because you are modifying the value of a const object. The fact this object is a subobject of a non const object does not change the fact that this object is const. – Oliv Jan 17 '18 at 17:04
  • @Oliv Okay, then indeed the compiler could optimize all reads of [rsp+512] in the loop, because the size could not be changed (otherwise it is UB anyway). But it could not optimize away the write to [rsp+512] before the loop, because it thinks that someone inside a call to process_value(...) might want to read from that location and get the size. – Taras Jan 17 '18 at 17:11
  • @Oliv You keep ignoring the fact that _objects can contain other objects, called subobjects. A subobject can be a **member subobject**, a base class subobject, or an array element. An object that is not a subobject of any other object is called a **complete object**._ Here, `array_wrapper.size` is a _subobject_, not a _complete object_. – Maxim Egorushkin Jan 17 '18 at 17:15
  • @Taras Try to find an expression that would make possible inside the body of `process_value` to get access to `size` without UB. (Actualy their is one implementation defined behavior that could do it, and if you find it out will discuss why it is irrelevant inside the body of a function declared as `process_value`). – Oliv Jan 17 '18 at 17:17
  • @MaximEgorushkin And then, what does it change for size subobject? – Oliv Jan 17 '18 at 17:17
  • @Oliv It changes its storage and lifetime. `size`s storage is that of `array_wrapper`, which is non-const. There is a chance that the authors of the 3 compilers are correct, don't you think? – Maxim Egorushkin Jan 17 '18 at 17:19
  • @MaximEgorushkin Yes they both are occupying the storage associated to the complete they are a member, but does not change anything to the modifiability of the value of size, neither it actualy change the accessibility of size from an array_wrapper element. – Oliv Jan 17 '18 at 17:22
  • @Oliv Okay, whatever, post bug reports to 3 compilers. – Maxim Egorushkin Jan 17 '18 at 17:23
  • @MaximEgorushkin As I already said you, this is not a compiler bug since it does not change the program observable behavior, this is just a missed optimization opportunity. Compiler, clang GCC and MSVC have already so many bug that do change program compilability and observable behavior, that this kind of optimization opportunity is irrelevant. Specialy if it would oblige them to deeply change their compiler infrastructure. – Oliv Jan 17 '18 at 17:25
  • @Oliv We agree to disagree agreeably. – Maxim Egorushkin Jan 17 '18 at 17:28
  • @Oliv It's hard to find such expression for my array+size example. But there might be something like this: struct Baz { Foo foo; Bar bar; }; -- some function may take a reference to Foo, get that Foo's address, add sizeof(Foo) to it, and reinterpret_cast it to Bar. That's what C++ allows. And I dislike it very much, because it sacrifices performance in order to allow some coders to write ugly hacks. Or maybe I just don't completely understand something there. Why was such "feature" needed in the first place?.. – Taras Jan 17 '18 at 17:29
  • @Taras With this precise class, which is a standard layout class you can get a pointer to a Baz from foo (but not from bar) so accessing bar exactly this way (and not using pointer arithmetic as you do) is allowed `*reinterpret_cast(a_Baz.foo).bar`. In the case of your question this trick is not possible because you can not get a pointer to an array from one of its element: [basic.compound/4](https://timsong-cpp.github.io/cppwp/n4659/basic.compound#4). – Oliv Jan 17 '18 at 17:37
  • @Taras But indeed before C++17 it would be allowed because of this [deleted sentence](https://stackoverflow.com/questions/48062346/is-a-pointer-with-the-right-address-and-type-still-always-a-valid-pointer-since) – Oliv Jan 17 '18 at 17:38
  • @Taras, So let's say that inside in a function `process_value(double& a)` you have some code that try to get access to size, how would you do it, using your trick for example? – Oliv Jan 17 '18 at 17:40
  • @Oliv I wouldn't do it even if I wanted to, because I have no knowledge of current element's index, and thus cannot calculate the desired offset. – Taras Jan 17 '18 at 18:02
  • 2
    My guess is that this is a combination of these subtle aliasing concerns combined with several compiler missed optimizations. The struct is standard layout and (I believe) it is valid for `process_value()` to cast it back to the original struct if it knows the offset. Even if the index isn't passed in, `process_value()` can still "know" the index by other means (such as counting the # of times it's called). Yes, this would be terrible code. But semantically it's valid without UB and the compiler must respect it. – Mysticial Jan 17 '18 at 19:43
  • 1
    But these still aren't enough to drop the optimization. Compilers generally don't do const-ness optimizations. I'm also unsure if the locally declared aspect of the struct has any semantic value in invalidating this optimization. But the optimizer probably doesn't care and assumes it's visible everywhere. In reality, the compiler is probably dropping the optimization the moment it sees *anything* inside the struct having its "address taken". This suppresses the Scalar Replacement of Aggregates optimization which would be required to isolate the size member. – Mysticial Jan 17 '18 at 19:43
  • @Mystical Actualy even if one knew how many byte there were between the double and the `size` member, one could not get a pointer to the `size` member without implementation defined behavior. Due to the rule of pointer arithmetic one should reinterpret_cast the pointer to `intptr_t` then add the shift then reinterpret_cast the result to `size_t*`.[\[expr.reinterpret.cast\]](https://timsong-cpp.github.io/cppwp/n4659/expr.reinterpret.cast#5). So, as you say, hopefully it is not enough to drop optimization because this implementation defined reinterpret_cast has the potential to do anything! – Oliv Jan 17 '18 at 20:28
  • 1
    It is allowed if it's the first member. ["A pointer to a standard-layout class may be converted (with reinterpret_cast) to a pointer to its first non-static data member and vice versa."](http://en.cppreference.com/w/cpp/types/is_standard_layout) So it's not UB for the `process_value()` to (only on the first invocation) cast the reference to the original struct and do what it wants with it. So semantically, I think the only thing that saves the optimization is the `const`. But compilers don't really use `const` for optimizations. – Mysticial Jan 17 '18 at 20:38
  • Btw, I don't think that this issue is related to some standardese stuff about past-the-end pointer in array. Here's a simplified example without any arrays at all: https://godbolt.org/g/SD1erC -- the same issue is still there. – Taras Jan 18 '18 at 12:23
  • @Taras It has nothing to do with _past-the-end pointer in array_, not sure how you inferred that. – Maxim Egorushkin Jan 18 '18 at 21:56
  • I expected `std::for_each` to fix it because it removes the need to reload `size` repeatedly, [but it doesn't seem so](https://godbolt.org/g/w4dYyK). – nwp Jan 19 '18 at 13:25