What prevents g++ from eliminating temporary std::array not used in runtime?

Question

#include <array>
#include <cassert>

class P {
  public:
    P() : _value(nullptr) {}
    ~P() { delete _value; }

  private:
   char *_value;
};

void foo() {
  if(std::array<P, 4>().size() != 4)
    assert(false);
}

The function foo() creates a temporary array to check the size is what the programmer expected. With -O1 or higher g++ figures out the assert will not fail and the call to __assert_fail is removed from the generated code. But g++ still generates code to first construct and then destruct the now unused array.

g++ -std=c++11 -O3 [4.8.2]:

0000000000000000 <_Z3foov>:1
   0:       55                      push   %rbp1
   1:       66 0f ef c0             pxor   %xmm0,%xmm01
   5:       53                      push   %rbx1
   6:       48 83 ec 28             sub    $0x28,%rsp1
   a:       66 0f 7f 04 24          movdqa %xmm0,(%rsp)1
   f:       48 8d 5c 24 20          lea    0x20(%rsp),%rbx1
  14:       48 89 e5                mov    %rsp,%rbp1
  17:       66 0f 7f 44 24 10       movdqa %xmm0,0x10(%rsp)1
  1d:       0f 1f 00                nopl   (%rax)1
  20:       48 83 eb 08             sub    $0x8,%rbx1
  24:       48 8b 3b                mov    (%rbx),%rdi1
  27:       e8 00 00 00 00          callq  2c <_Z3foov+0x2c>1
  2c:       48 39 eb                cmp    %rbp,%rbx1
  2f:       75 ef                   jne    20 <_Z3foov+0x20>1
  31:       48 83 c4 28             add    $0x28,%rsp1
  35:       5b                      pop    %rbx1
  36:       5d                      pop    %rbp1
  37:       c3                      retq   1

clang on the other hand removes all code except the return statement.

clang -std=c++11 -O3:

0000000000000000 <_Z3foov>:1
   0:       c3                      retq   1

Just bad luck with g++ or is there a reason for the difference?

Those are different compilers with different optimization strategies!? — StoryTeller - Unslander Monica, Jan 01 '14 at 15:53
Could you try adding something like `P() : _value(nullptr) { std::cout << "here" << std::endl; }` to the constructor, and try clang again? — Sergey Kalinichenko, Jan 01 '14 at 15:59
@dasblinkenlight - side effects from the constructor should of course not be optimized away. Since constructors and destructors are inlined, the compiler can see there are no side effects in the original code and the question is why g++ can't use this knowledge to eliminate the temporary array. — treap, Jan 01 '14 at 16:06
clang's constant propagation is superior (in this case anyway). — egur, Jan 01 '14 at 18:12
The constant propagation works on both clang and g++ since the assert it's eliminated, but afterwards it's dead code elimination for removing the array. — bolov, Jan 01 '14 at 22:05
I suspect it's about memory access. In the most conservative way, any memory access through a pointer is considered to alter the entire memory, since the value of the pointer is not known at compile time. Maybe clang has an algorithm that can conclude that in this case the memory access does nothing. — bolov, Jan 01 '14 at 22:08
`delete` is just a function which is linked and therefore g++ can't see if it has side effects, so it has to assume that it does have side effects. Clang in this case might have a special handling for knowing that `delete nullptr` has no side effects; — Vinzenz, Jan 03 '14 at 10:11
@Vinzenz - The semantics of delete is defined in the C++ standard (3.7.4.2 Deallocation functions), so it is just not an arbitrary library function. And substituting "delete _value" by "if(_value) delete _value" in the actual code doesn't improve the result from g++. — treap, Jan 03 '14 at 13:55
@treap Well I know that this is defined by the standard. It was just a guess why g++ might not do it. But that it actually doesn't remove it in case of adding the `if` shows that it doesn't help. — Vinzenz, Jan 03 '14 at 15:13
this is because of gcc's policy of [using weak linkage for operator delete](http://stackoverflow.com/questions/15525537/what-are-practical-applications-of-weak-linking) it simply doesn't know what the call to delete is going to do. It is gcc that does the 'special handling'. — user3125280, Jan 18 '14 at 17:55
clang also fails if you replace 4 with something a bit larger. — Marc Glisse, Jan 19 '14 at 09:35
To make this clear to people reading, `e8 00 00 00 00 callq 2c <_Z3foov+0x2c>1` is a place holder for a call to an undefined function i.e. one that will be linked in later. I am quite sure it is operator delete, but OP will need to list the code properly (with symbolic information). — user3125280, Jan 19 '14 at 10:27

score 5 · Answer 1 · edited May 23 '17 at 12:14

Firstly, great question.

EDIT:

I didn't really read your code properly the first time. There is an important external call in your code. This is in this instruction e8 00 00 00 00 callq 2c <_Z3foov+0x2c>1. It does not call the address ahead of itself - instead it's target will get replaced at link time. This is how linking works - the elf file will say "such and such an instruction will resolve to this target at link time." The assembler has not been listed completely, and so we don't know the address of this call. Presumably it is delete _value in the code, however libstdc++ (with delet, etc) is dynamically linked by default. You can change this by using my compiler flags, or do your listing inside gdb (i.e. after linking).

Back to the answer:

This is a special case where gcc programmers have made a choice not to optimize. The reason being that operators new and delete are marked as 'weak symbols' and the linker will look for alternatives, choosing a user supplied or falling back if none is found.

Here is a discussion of the rationale behind this. These operators are designed to be globally replaceable, and weak linking is one solution.

~~Statically linking doesn't change this, because free and malloc in glibc can still be changed at linktime.~~

Statically linking, or using link time optimization should use the in-built delete, however in this case, if you instead linked statically, the chance is still missed. In your original example, however, there is no such chance.

Compiling this:

#include <array>
#include <cassert>
#include <cstdlib>

void * operator new(std::size_t n) throw(std::bad_alloc)
{
  return malloc(n);
}
void operator delete(void * p) throw()
{
if(p != nullptr)
  free(p);
}

class P {
  public:
    P() : _value(nullptr) {}
    ~P() { delete _value; }

  private:
   char *_value;
};

void foo() {
  if(std::array<P, 4>().size() != 4)
    assert(false);
}

int main(){
foo();
}

with this; g++ -std=c++11 -O3 -static -Wa,-alh test.cpp -o test

links to libstdc++/glibc statically, so that it should know what free and malloc are, however it doesn't quite realise foo is trivial.

However, nm -gC test | grep free produces the weak symbol __free_hook, which I found an explanation for here. So the behaviour of free and malloc (and hence, operator new and delete) is actually always changeable at link time when compiling with gcc. That is what prevents the optimization - annoyingly the -fno-weak leaves those symbols there.

The above paragraph is true, but is a result of the optimization being missed, not a reason to avoid optimization.

With link time optimization it is possible gcc can be coaxed into doing this optimization, but first you must build everything else with -flto, including libstdc++.

Here is the most gcc does for me when building the example statically:

Dump of assembler code for function _Z3foov:
0x08048ef0 <+0>:    push   %esi
0x08048ef1 <+1>:    push   %ebx
0x08048ef2 <+2>:    sub    $0x24,%esp
0x08048ef5 <+5>:    movl   $0x0,0x10(%esp)
0x08048efd <+13>:   lea    0x20(%esp),%ebx
0x08048f01 <+17>:   movl   $0x0,0x14(%esp)
0x08048f09 <+25>:   lea    0x10(%esp),%esi
0x08048f0d <+29>:   movl   $0x0,0x18(%esp)
0x08048f15 <+37>:   movl   $0x0,0x1c(%esp)
0x08048f1d <+45>:   lea    0x0(%esi),%esi
0x08048f20 <+48>:   sub    $0x4,%ebx
0x08048f23 <+51>:   mov    (%ebx),%eax
0x08048f25 <+53>:   test   %eax,%eax
0x08048f27 <+55>:   je     0x8048f31 <_Z3foov+65>
0x08048f29 <+57>:   mov    %eax,(%esp)
0x08048f2c <+60>:   call   0x804dea0 <free>
0x08048f31 <+65>:   cmp    %esi,%ebx
0x08048f33 <+67>:   jne    0x8048f20 <_Z3foov+48>
0x08048f35 <+69>:   add    $0x24,%esp
0x08048f38 <+72>:   pop    %ebx
0x08048f39 <+73>:   pop    %esi
0x08048f3a <+74>:   ret

The call to free isn't going anywhere.

If it is a problem, optimize it yourself. Templates make this easy (assuming std::array is passed as a template argument, otherwise why are you checking it's size()?).

#include <array>
#include <cassert>

class P {
  public:
    P() : _value(nullptr) {}
    ~P() { delete _value; }

  private:
   char *_value;
};

void foo() {
  if(std::tuple_size<std::array<P, 4> >::value != 4)
    assert(false);
}

int main(){
foo();
}

The code can be made to fail silently if std::array<P, 4> is vector or something, and fall back on you default construction method.

nm -C test outputs W operator new(unsigned int, void*) when I added #include <new>, so placement new at least can be changed link time (it is a weak symbol) - the others are special, and their eventual targets reside in libstdc++ (again, they are linked dynamically by default).

I should probably add that checking for _value == nullptr doesn't fix this, but that is another story. — user3125280, Jan 19 '14 at 02:25
@cmaster I fixed my answer to address this issue (no pun intended) — user3125280, Jan 19 '14 at 10:36
Also after passing this on to the gcc people they say it does get optimized away if gcc's loop unroller is told to, however none of the optimization switches do this (it would require a patch and more time optimizng and would rarely be helpful). — user3125280, Jan 19 '14 at 10:47
@MarcGlisse I believe building statically with -O3 and 1 instead of 4 gives the right results. — user3125280, Jan 19 '14 at 13:12
`new` is not used in this code, you can remove it. It is irrelevant that the call is to `free` and not `random_opaque_function`, the test `p!=0` is what matters. — Marc Glisse, Jan 19 '14 at 13:28
@MarcGlisse the point is that a) the functions are not opaque when statically linking and b) free/delete are built in with standard behaviour. They are made opaque by default by gcc (and not clang apparently) to facilitate memory analysis. Yes the p!=0 is important, and gcc misses it when statically linking *as well* (the OP is dynamically linking) because it doesn't unroll the array constrution. — user3125280, Jan 19 '14 at 13:35
Did you try compiling free(0) with gcc (no need for static or no-weak or anything like that)? — Marc Glisse, Jan 19 '14 at 15:20
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/45601/discussion-between-user3125280-and-marc-glisse) — user3125280, Jan 20 '14 at 01:48
@user3125280 the call to delete being opaque or not is a separate issue. I'm curious about why gcc generates worse code for an unused array
than for a std::pair,std::pair
> (replace size() by some other code that can be evaluated at compile time). — treap, Jan 20 '14 at 20:59
@treap the difference is that for an array, you get a loop (which gcc fails to unroll) while with the pairs you get the flat version directly. — Marc Glisse, Jan 20 '14 at 22:33
@Marc Glisse Is this an expected result from the optimizer, that it is harder to optimize a loop with a fixed number of iterations than code that is unrolled from the beginning? If so I will avoid using loops in performance critical code. (Maybe I could unroll loops using template recursion instead.) — treap, Jan 22 '14 at 14:40
@treap careful about generalities. It is not easy to guess if unrolling will be profitable, it sometimes isn't (otherwise compilers would just go ahead and always do it). So unless you have found a specific place where the compiler did not unroll but you happen to know that unrolling would enable large simplifications (like here where we can remove the code completely), avoid premature optimizations. — Marc Glisse, Jan 22 '14 at 15:04

score 0 · Answer 2 · answered Jan 03 '14 at 20:54

0

Because there could be side effects in the constructor of std::array. However, as g++ and clang don't use the same standard library (libstdc++ for g++ and libc++ for clang).

For the question why does clang remove the code, maybe that in libc++ the std::array's constructor in inlined, so the optimizer can see that there is no side effects, therefore need to keep the code.

answered Jan 03 '14 at 20:54

delehef

681
4
7

The test with clang is also with libstdc++. The OP posted that in a comment on a now-deleted answer, and on my system, I get the same results. clang doesn't need to use libc++. – Jan 03 '14 at 21:39

score 0 · Answer 3 · answered Jan 06 '14 at 19:09

0

This could happen for any number of reasons. Perhaps gcc doesn't inline as well as clang for some reason. Turning up the inlining knobs on gcc might help. Or perhaps there is something else going on like an aliasing issue gcc can't resolve for some reason. It's impossible to know without tracing through gcc to discover the details.

Bottom line, it's just two different compilers doing different code transformations. gcc could be enhanced to cover this case.

answered Jan 06 '14 at 19:09

David Greene

394
2
12

What is not inlined? The output contains no external calls. – treap Jan 06 '14 at 22:58
@treap operator delete **is not inlined**, as seen in the unresolved callq instruction. How are you getting the code listing? – user3125280 Jan 19 '14 at 13:10
@David Greene The output is from "g++ -std=c++11 -O3 -c t.cc && objdump -d t.o". Operator delete is not inlined, but if I change to "if(_value) delete _value" then the if statement will be inlined. Still it does not help. – treap Jan 20 '14 at 20:22

What prevents g++ from eliminating temporary std::array not used in runtime?

3 Answers3