clang vs gcc - optimization including operator new

Question

I have this simple example I was testing against and I noticed that gcc optimizations (-O3) seems not be as good as clang ones when operator new is involved. I was wondering what might be the issue and if it possible to force gcc to produce more optimized code somehow?

template<typename T>
T* create() { return new T(); }

int main() {
    auto result = 0;
    for (auto i = 0; i < 1000000; ++i) {
        result += (create<int>() != nullptr);
    }

    return result;
}


#clang3.6++ -O3 -s --std=c++11 test.cpp
#size a.out
   text    data     bss     dec     hex filename
   1324     616       8    1948     79c a.out
#time ./a.out 
real 0m0.002s
user 0m0.001s
sys  0m0.000s

#gcc4.9 -O3 -s --std=c++11 test.cpp
#size a.out
   text    data     bss     dec     hex filename
   1484     624       8    2116     844 a.out
#time ./a.out
real 0m0.045s
user 0m0.035s
sys  0m0.009s

Example above is just a simple version of the code I have been testing in the beginning, but it still illustrates the difference between gcc/clang. I checked the assembly code as well and there is not a huge difference in size, but definitely in performance. On the other hand maybe clang is doing something which is not allowed?

Using [godbolt](http://gcc.godbolt.org/#) for this example `clang` seems to optimize away all the code to a `movl $1000000, %eax` while `gcc` does not. — Shafik Yaghmour, Sep 04 '14 at 14:47
yea, it's really wired, I tired just with value semantic - http://goo.gl/h3S47K - and gcc optimize all the code the same way as clang, but whenever operator new is used - http://goo.gl/3129uS - then optimization is not applied, was wondering why? — Kris, Sep 04 '14 at 15:05
I suppose it depends whether the optimization clang is doing falls under the as-if rule or not. — Shafik Yaghmour, Sep 04 '14 at 15:06
good point, so the question would be if operator new might be a part of as-if rule or not? — Kris, Sep 04 '14 at 15:14
Intuitively speaking, leaking memory does *not* fall under the as-if rule but it's probably a question for standard experts. — Mark B, Sep 04 '14 at 15:39
@MarkB yes but if `new` throws that would change the programs return value which as far as I understand falls under observable behavior. — Shafik Yaghmour, Sep 04 '14 at 15:43
Mostly, gcc considers operator new as a regular function. If you provide an inline version of operator new that calls malloc, then gcc may be able to do some magic. — Marc Glisse, Sep 04 '14 at 16:26

score 14 · Accepted Answer · edited May 23 '17 at 11:45

14

If we plug this code into godbolt we can see that clang optimizes away the code to this:

main:                                   # @main
movl    $1000000, %eax          # imm = 0xF4240
ret

while gcc does not perform this optimization. So the question is this a valid optimization? Does this follow the as-if rule, which is noted in the draft C++ standard section 1.9 Program execution which says(emphasis mine):

The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.⁵

where note 5 says:

This provision is sometimes called the “as-if” rule, because an implementation is free to disregard any requirement of this International Standard as long as the result is as if the requirement had been obeyed, as far as can be determined from the observable behavior of the program. For instance, an actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no side effects affecting the observable behavior of the program are produced.

Since new could throw an exception which would have observable behavior since it would alter the return value of the program.

R.MartinhoFernandes argues that it is implementation detail when to throw an exception and therefore clang could decide this scenario would not cause an exception and therefore eliding the new call would not violate the as-if rule. This seems like a reasonable argument to me.

but as T.C. points out:

A replacement global operator new could have been defined in a different translation unit

Casey provided an example that shows even when clang sees there is a replacement it still performs this optimization even though there are lost side effects. So this seems like overly aggressive optimization.

Note, memory leaks are not undefined behavior.

edited May 23 '17 at 11:45

Community

1
1

answered Sep 04 '14 at 15:47

Shafik Yaghmour

154,301
39
440
740

The new'ed pointers are not deleted. Doesn't that constitute undefined behavior? – ach Sep 04 '14 at 15:51
@AndreyChernyakhovskiy according to [Are memory leaks “undefined behavior” class problem in C++?](http://stackoverflow.com/q/1978709/1708801) memory leaks are not UB. – Shafik Yaghmour Sep 04 '14 at 17:13
2

`new` could throw an exception, but it doesn't have to. In the end, it's up to the implementation to decide *when* `new` will throw an exception. The implementation can simply say it doesn't throw here and call it a day. (Consider an implementation with garbage collection, for example) – R. Martinho Fernandes Sep 04 '14 at 17:23
@R.MartinhoFernandes I can not find a section of the standard that allows `new` to not throw an exception in let's say an out of memory scenario. Can you quote me the section that allows this? I will happily update my answer if this is indeed the case. – Shafik Yaghmour Sep 04 '14 at 17:27
4

@ShafikYaghmour er, and who decides you are out of memory? It's the implementation. There's nothing to quote. You're the one that has to find a quote that says an implementation can't be smart enough to just fake memory out of thin air. (or, as would be fine in this case, just do some rudimentary garbage collection) – R. Martinho Fernandes Sep 04 '14 at 17:27
1

If you want, put it this way: the code given will not throw in a machine with enough memory, and it will end up returning 1000000. That much should be obvious, and should be enough to show that returning 1000000 is observable behaviour allowed by the standard for this program. The standard doesn't require a compiler to actually use the memory, as long as it produces the observable behaviour (by the "as-if" rule you quoted). Clang just chose to produce this one behaviour out of several possible ones. – R. Martinho Fernandes Sep 04 '14 at 17:32
@R.MartinhoFernandes fair enough, I updated my answer. Feel free to edit if you feel I misquoted you. – Shafik Yaghmour Sep 04 '14 at 17:33
2

I'm not convinced that this is allowed by as-if. A replacement global `operator new` could have been defined in a different translation unit, and whether that function is called could have observable side effects. – T.C. Sep 04 '14 at 17:42
2

@T.C. in this specific case there is no other translation unit. – Shafik Yaghmour Sep 04 '14 at 17:45
4

@ShafikYaghmour The linker may know that there is no other translation unit, but the compiler does not. [Indeed, in such a case side effects are lost.](http://coliru.stacked-crooked.com/a/c784cb368fe827b3). – Casey Sep 04 '14 at 17:47
@Casey that's a good point. The optimisation is allowed , but not at the point clang has it. – R. Martinho Fernandes Sep 04 '14 at 18:22
@Casey that is a great example, thanks for your comment. – Shafik Yaghmour Sep 04 '14 at 18:53
Could you clean up the answer? The strikeout shows history, and while amusing, it does clutter the answer up. ;) – Yakk - Adam Nevraumont Aug 07 '15 at 10:52
@Yakk good point, at this time it does not add much, at the time it was fluid so it was helpful to keep it in, removing. You are welcome to edit as well, if I disagree I will just revert ;-) – Shafik Yaghmour Aug 07 '15 at 11:43
Have you tried editing a post well on a tiny phone? Is painful. I need to get vi key bindings set up. – Yakk - Adam Nevraumont Aug 07 '15 at 18:36
@Yakk yes I have and yes it is painful. Well for future reference please feel free to edit my posts if something needs improving. I can't remember the last time I rolled back an edit. – Shafik Yaghmour Aug 07 '15 at 18:40

score 9 · Answer 2 · answered Sep 07 '14 at 09:49

The rationale is that there is no rule on how much memory a machine may have, nor does the language provide any way to examine the amount of memory allocated or free (though note that POSIX does define mallinfo). Here, we simulate your program on an abstract machine with infinite memory machine where the allocation continuously succeeds. Or at least, infinite memory for the purposes of the allocations in this loop but not consistently for a whole program. Anyhow, I'm aware of two good objections to this.

First, consider if it were malloc instead of operator new. The C99 spec states:

The malloc function allocates space for an object whose size is specified by size and whose value is indeterminate. The malloc function returns either a null pointer or a pointer to the allocated space.

Compiling malloc() to always succeed seems to comply with that spec. But what if you call it more times than we could actually create a pointer for and only exit the loop once it fails? One possible way out is to note that there is no rule in the abstract machine definition that a 64-bit pointer can only hold 2⁶⁴ possible values, only that there is no way provided to construct values outside this range. It appears that the implementation may create such things at will. Personally I find that answer unsatisfying.

Consider that we also optimize things like "T *t1 = new T; T *t2 = (T*)rand();" by assuming that t1 may not alias t2. It doesn't matter if rand picked the right address or if you iterated across the whole address space, once we show that t1's address did not feed into t2, we should be able to conclude that they refer to different objects. While I would like that to be the way the standard worked, and that is how compilers work, I'm not aware of any standardese to support that position. This will likely become the subject of a future paper.

Second, operator new isn't malloc, it is a replaceable function. As was suggested in Casey's reply, we intend to follow the rules in N3664 (though I don't think clang treats new-expressions differently from explicit calls to operator new yet). Shafik pointed out that seems to violate causality but N3664 started life as N3433, and I'm pretty sure we wrote the optimization first and wrote the paper afterwards anyway.

+1 thank you taking the time to provide this answer, I always appreciate it when compiler implementers or members of the standards committee answer or comment on questions like this. These things are rarely documented in a way that allows full understanding of the rationale. — Shafik Yaghmour, Nov 06 '14 at 18:03

score 5 · Answer 3 · answered Sep 04 '14 at 16:20

5

It would appear that clang is optimizing memory allocations according to the changed rules in N3664 Clarifying Memory Allocation which was incorporated into C++14. N3664 allows reducing the number of calls to allocation/deallocation functions by merging allocations or eliminating allocations altogether.

answered Sep 04 '14 at 16:20

Casey

41,449
7
95
125

1

`clang 3.2` also behaves this way, so that seems unlikely unless it was back-ported. – Shafik Yaghmour Sep 04 '14 at 17:15

clang vs gcc - optimization including operator new

3 Answers3

Linked