3

Consider the following code (godbolt):

#include <optional>
#include <array>

struct LargeType {
    std::array<int, 256> largeContents;
};

LargeType doSomething();

std::optional<LargeType> wrapIntoOptional(){
    return std::optional<LargeType> {doSomething()};
}

As you see, there is a function returning a large POD and then a function wrapping it into a std::optional. As visible in godbolt, the compiler creates a memcpy here, so it cannot fully elide moving the object. Why is this?

If I understand it correctly, the C++ language would allow eliding the move due to the as-if rule, as there are no visible side effects to it. So it seems that the compiler really cannot avoid it. But why?

My (probably incorrect) understanding how the compiler could optimize the memcpy out is to hand a reference to the storage inside the optional to doSomething() (as I guess such large objects get passed by hidden reference anyway). The optional itself would already lie on the stack of the caller of wrapIntoOptional due to RVO. As the definition of the constructor of std::optional is in the header, it is available to the compiler, so it should be able to inline it, so it can hand that storage location to doSomething in the first place. So what's wrong about my intuition here?

To clarify: I don't argue that the C++ language requires the compiler to inline this. I just thought it would be a reasonable optimization and given that wrapping things into optionals is a common operation, it would be an optimization that is implemented in modern compilers.

gexicide
  • 38,535
  • 21
  • 92
  • 152

2 Answers2

5

It is impossible to elide any copy/move through a constructor call into storage managed by the constructed object.

The constructor takes the object as a reference. In order to bind the reference to something there must be an object, so the prvalue from doSomething() must be materialized into a temporary object to bind to the reference and the constructor then must copy/move from that temporary into its own storage.

It is impossible to elide through function parameters. That would require knowing the implementation of the function and the way C++ is specified it is possible to compile each function only knowing the declarations of other functions (aside from constant expression evaluation). This would break that or require a new type of annotation in the declaration.

None of this prevents the compiler from optimizing in a way that doesn't affect the observable behavior though. If your compiler is not figuring out that the extra copy can be avoided and has no observable side effects when seeing all relevant function/constructor definitions, then that's something you could complain to your compiler vendor about. The concept of copy elision is about allowing the compiler to optimize away a copy/move even though it would have had observable side effects.

user17732522
  • 53,019
  • 2
  • 56
  • 105
  • The problem I see in your reasoning is that I'm pretty sure the compiler can inline the constructor of std::optional here and it also sees that moving the object has no visible side effects. If it couldn't, I would agree with your reasoning that "it is impossible to elide through function parameters". Once the constructor is inlined, the compiler is free to reason about memory locations in the optional object. – gexicide Nov 25 '22 at 11:11
  • @gexicide Yes it can and it can then also skip the copy if it wouldn't have observable side effects. However you can't make that mandatory because then you would force compilers to do inlining. And having an optional optimization that affects side effects would probably not be a good idea if it is dependent on the implementation of other functions. (You basically could never locally determine whether a copy constructor will actually run.) – user17732522 Nov 25 '22 at 11:14
  • So, you basically agree that this is a missed optimization opportunity in this case? The compiler could elide the memcpy here; it just doesn't do so. (I don't argue that the compiler *has to* elide the copy. I just thought it would be a reasonable and therefore implemented optimization in latest clang, as wrapping things into optionals is an often used thing) – gexicide Nov 25 '22 at 11:15
  • @gexicide I think in this case it doesn't quite work. Since you didn't give a visible definition for `doSomething` there could be some details to consider. Because of RVO it is basically possible to define `doSomething` in such a way and call `wrapIntoOptional` in such a way that the program could observe the optimization by comparing addresses of the result object of `wrapIntoOptional` with an address obtained for the return value in `doSomething`. That way there can be an observable side effect. But if you define `doSomething` in a fully visible way, then yes. – user17732522 Nov 25 '22 at 11:20
  • My example from a deleted comment wasn't actually good since it had UB. But I think it is still possible. At the very least it is difficult to prove that it isn't possible. – user17732522 Nov 25 '22 at 11:35
  • I came up with [this example](https://godbolt.org/z/orx5v48G6), which shows that the copy cannot be elided without the full context. From my reading of standard [\[expr.eq/3\]](https://eel.is/c++draft/expr.eq#3) it shouldn't be UB, and `main` should return 0, which would not be the case if the copy is elided in the original example. – IlCapitano Nov 25 '22 at 12:17
  • @IlCapitano how does your example work with the answer below (https://stackoverflow.com/a/74572802/1408611)? Would that mean the compiler did something illegal there? – gexicide Nov 25 '22 at 13:29
  • @IlCapitano The result object from `doSomethng` would have been destroyed before `wrapIntoOptional` returns, so the value of the comparison is anyway unspecified. Performing the optimization wouldn't violate any standard requirement. (Also, technically, it is implementation-defined whether the invalidated pointer may be used in pointer comparison.) – user17732522 Nov 25 '22 at 16:58
  • @IlCapitano You need to perform the comparison before the function returns. At that time storage for the result `std::optional` is already obtained and the return value from `doSomething` is still alive. At this point I think there ought to be no overlap in the two storage locations, although there are still some difficulties in that case. – user17732522 Nov 25 '22 at 17:02
  • @user17732522 I don't see how the pointer comparison is a problem. From my reading of [\[expr.eq/3\]](https://eel.is/c++draft/expr.eq#3) (specifically _Otherwise, the pointers compare unequal_) the result of the comparison should be `false`. In my example `p` is an invalid pointer, but the standard doesn't say comparisons using invalid pointers are unspecified or undefined. – IlCapitano Nov 26 '22 at 01:57
  • 1
    @gexicide I did a local test with two translation units separating the definitions of `doSomething` and `wrapIntoOptional`, and indeed adding `noexcept` does make the program return `1` instead of `0` with `-O3` (but not with `-O0` or a [single translation unit](https://godbolt.org/z/Kxc43f9Pd)). Another thing I found that allows the optimization is adding the `[[gnu::pure]]` attribute to `doSomething`. – IlCapitano Nov 26 '22 at 02:04
3

You can add noexcept to elide copy:

https://godbolt.org/z/rrGEfrdzc

smitsyn
  • 578
  • 3
  • 6
  • I wonder why. I can't see any reason that `noexcept` ought to change anything here. – user17732522 Nov 25 '22 at 16:58
  • What I believe is that if `doSomething` throws, there shouldn't be optional constructed even partially. As I see it it's okay to put std::array into the memory region occupied by `std::optional` even if doSomething throws, just the compiler does not implement the required logic to apply the optimization in the more complicated case. – smitsyn Nov 25 '22 at 21:11