37

I played around with Godbolt's CompilerExplorer. I wanted to see how good certain optimizations are. My minimum working example is:

#include <vector>

int foo() {
    std::vector<int> v {1, 2, 3, 4, 5};
    return v[4];
}

The generated assembler (by clang 5.0.0, -O2 -std=c++14):

foo(): # @foo()
  push rax
  mov edi, 20
  call operator new(unsigned long)
  mov rdi, rax
  call operator delete(void*)
  mov eax, 5
  pop rcx
  ret

As one can see, clang knows the answer, but does quite a lot of stuff before returning. It seems to my that even the vector is created, because of "operator new/delete".

Can anyone explain to me what happens here and why it does not just return?

The code generated by GCC (not copied here) seems to construct the vector explicitly. Does anyone know GCC is not capable to deduce the result?

Max Görner
  • 674
  • 7
  • 16
  • 15
    I think C++ compiler is not yet allowed to remove calls to global `operator new` and `operator delete`. There was a proposal that would allow compiler doing this, I am not sure if it made into a standard. Don't forget that both global operators can be `overriden`. –  Nov 02 '17 at 09:59
  • 4
    @Max Görner Defining a vector has a side effect of allocating memory that can be not enough. – Vlad from Moscow Nov 02 '17 at 10:01
  • 3
    Standard Library calls are generally considered _observable effects_ which are barred from elimination, or even reordering. – MSalters Nov 02 '17 at 10:01
  • 3
    There might be an overload of global operator new/delete in another translation unit that causes side effects – M.M Nov 02 '17 at 10:02
  • 4
    Everything the guys above me said, really belongs in an answer – StoryTeller - Unslander Monica Nov 02 '17 at 10:04
  • 1
    @VladfromMoscow: That is actually insufficient to bar optimization - the Standard does not prescribe how much memory there should be, and this may vary between runs. Hence, under the as-if rule, you couldn't notice an optimization that saves memory. – MSalters Nov 02 '17 at 10:05
  • @MSalters The work of the program depends on this side effect. So the observed behavior will be changed if to remove the vector definition. – Vlad from Moscow Nov 02 '17 at 10:08
  • @Vlad - you would notice - through the absence of an expected throwing of an exception - such an optimisation in an environment with insufficient available memory for the dynamic memory allocation to succeed. – Peter Nov 02 '17 at 10:10
  • The optimization of this vector is ( as most such questions ) absolutly irrelevant in real live. –  Nov 02 '17 at 10:13
  • @Peter: The Standard allows many possible behaviors, and doesn't require run-to-run consistency. You might have 1 byte of heap the first run, and `ULONGLONGMAX` on the second run. – MSalters Nov 02 '17 at 10:15
  • @MSalters - I'm not suggesting run-to-run consistency is required. I'm saying it would be possible to detect the presence of such an optimisation by deliberately limiting memory available to be allocated. – Peter Nov 02 '17 at 10:18
  • @Peter: The Standard disagrees. It describes the behavior of a C++ in terms of an unspecified abstract machine, not your actual (possibly limited) physical machine. In case you missed it, no realistic hardware has 2^64 bytes of memory available. – MSalters Nov 02 '17 at 10:23
  • @Peter applying the same reasoning, also unused automatic primitive variables should never be optimized out because one could deliberately limit the stack space at runtime. In fact, I'd bet any optimization can be detected by deliberately changing the environment. – Massimiliano Janes Nov 02 '17 at 10:30
  • 1
    Gcc does not yet mark operator new/delete as magic functions (while clang does). `inline void* operator new(std::size_t n){return malloc(n);} inline void operator delete(void*p)noexcept{free(p);}` lets it optimize to just `return 5;`. – Marc Glisse Nov 02 '17 at 10:50
  • 1
    I don't intend to be antagonistic, but the standard doesn't mandate *any* optimizations (ok, except for certain cases of copy/move elision in C++17). If this optimization is valid, but GCC doesn't do it, the most likely explanation is that no one bothered to implement it. – Arne Vogel Nov 02 '17 at 11:21
  • @Ivan see [Is the compiler allowed to optimize out heap memory allocations?](https://stackoverflow.com/q/31873616/1708801) – Shafik Yaghmour Nov 02 '17 at 17:53

4 Answers4

29

std::vector<T> is a fairly complicated class that involves dynamic allocation. While clang++ is sometimes able to elide heap allocations, it is a fairly tricky optimization and you should not rely on it. Example:

int foo() {
    int* p = new int{5};
    return *p;
}
foo():                                # @foo()
        mov     eax, 5
        ret

As an example, using std::array<T> (which does not dynamically allocate) produces fully-inlined code:

#include <array>

int foo() {
    std::array v{1, 2, 3, 4, 5};
    return v[4];
}
foo():                                # @foo()
        mov     eax, 5
        ret

As Marc Glisse noted in the other answer's comments, this is what the Standard says in [expr.new] #10:

An implementation is allowed to omit a call to a replaceable global allocation function ([new.delete.single], [new.delete.array]). When it does so, the storage is instead provided by the implementation or provided by extending the allocation of another new-expression. The implementation may extend the allocation of a new-expression e1 to provide storage for a new-expression e2 if the following would be true were the allocation not extended: [...]

Vittorio Romeo
  • 90,666
  • 33
  • 258
  • 416
  • but the problem is, what does the standard say about allowing such optimizations ? for example, if you make that int* volatile the new is not optimized, suggesting that clang considers the evaluation of new not an observable sideeffect by itself, but is this correct ? – Massimiliano Janes Nov 02 '17 at 10:36
  • @MassimilianoJanes: I think these optimizations fall under the [as if rule](https://stackoverflow.com/questions/15718262/what-exactly-is-the-as-if-rule). If clang can figure out the the observable behavior doesn't change, it can elide things like `new` and `delete`. – Vittorio Romeo Nov 02 '17 at 10:43
  • 3
    according to [Clarifying Memory Allocation](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3664.html) it has not always been so clear-cut whether these are observable or not. It would be interesting to see exactly what has changed since then ... – Massimiliano Janes Nov 02 '17 at 10:58
  • Note also discussed in [Is the compiler allowed to optimize out heap memory allocations?](https://stackoverflow.com/q/31873616/1708801) – Shafik Yaghmour Nov 02 '17 at 17:52
7

As the comments note, operator new can be replaced. This can happen in any Translation Unit. Optimizing a program for the case it's not replaced therefore requires Whole-Program Analysis. And if it is replaced, you have to call it of course.

Whether the default operator new is a library I/O call is unspecified. That matters, because library I/O calls are observable and therefore they can't be optimized out either.

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • I'm not sure that `std::vector` is *required* to `call std::allocator::allocate`. For example, SSO is allowable. – Richard Hodges Nov 02 '17 at 10:19
  • There was a proposal that tries to allow compiler from removing paired call to `new` and `delete`, regardless of whether they replaced or not. Moreover it was supposed to be allowed to replace small heap allocations with allocation on stack. This is an optimization that languages as Java and C# are capable of, but not C++. Or at least I am not sure if this was added to C++ standard and if compilers do this or not. Ofc such optimization is that as relevant for C++ as for Java and C#. –  Nov 02 '17 at 10:20
  • 4
    @RichardHodges: SSO is not allowed for vector, because swapping two small vectors would then require swapping their elements, and swap for containers is not allowed to move, copy or swap any elements (`array` is an exception, and `string` is not really a container; see [container.requirements.general]/9). –  Nov 02 '17 at 10:27
  • @Fanael I see, thank you. So we're left with a custom stack-based allocator to get this optimisation. – Richard Hodges Nov 02 '17 at 10:30
  • 4
    "An implementation is allowed to omit a call to a replaceable global allocation function" <-- directly from the standard [expr.new] – Marc Glisse Nov 02 '17 at 10:56
4

N3664's change to [expr.new], cited in one answer and one comment, permits new-expressions to not call a replaceable global allocation function. But vector allocates memory using std::allocator<T>::allocate, which calls ::operator new directly, not via a new-expression. So that special permission doesn't apply, and generally compilers cannot elide such direct calls to ::operator new.

All hope is not lost, however, for std::allocator<T>::allocate's specification has this to say:

Remarks: the storage is obtained by calling ​::​operator new, but it is unspecified when or how often this function is called.

Leveraging this permission, libc++'s std::allocator uses special clang built-ins to indicate to the compiler that elision is permitted. With -stdlib=libc++, clang compiles your code down to

foo():                                # @foo()
        mov     eax, 5
        ret
T.C.
  • 133,968
  • 17
  • 288
  • 421
  • When I asked in February, Richard Smith, one of the authors of N3664, replied that indeed the wording in the library, taken literally, does not provide the liberties we want, but the intention was always to give the same liberties as with new-expressions, and someone should fix the text to match the intent. – Marc Glisse Nov 03 '17 at 21:46
  • @MarcGlisse Perhaps, but so far the set of "someone"s is an empty set. – T.C. Nov 07 '17 at 19:45
0

The compiler can't optimise heap-related code, as heap-related code is run-time specific. The heap-related code is the use of a std::vector, which holds the managed data in heap memory.

In your example all values as well as the size is known at compile-time, therefore it's possible to use std::array instead, which is initialized by aggregate initialization, and therefore may be constexpr-qualified.

Changing your example using std::array reduces your function to your expected output:

#include <array>

int foo() {
    std::array<int,5> v {1, 2, 3, 4, 5};
    return v[4];
}
foo():                                # @foo()
        mov     eax, 5
        ret

Using the given function will still result in a call to foo(). In order to eliminate the call, the function must be qualified as constexpr:

#include <array>

constexpr int foo() {
    constexpr std::array<int,5> v {1, 2, 3, 4, 5};
    return v[4];
}

int main() {
    return foo();
}
main:                                   # @main
        mov     eax, 5
        ret
gkhaos
  • 694
  • 9
  • 20