While benchmarking code involving std::optional<double>
, I noticed that the code MSVC generates runs at roughly half the speed compared to the one produced by clang or gcc. After spending some time reducing the code, I noticed that MSVC apparently has issues generating code for std::optional::operator=
. Using std::optional::emplace()
does not exhibit the slow down.
The following function
void test_assign(std::optional<double> & f){
f = std::optional{42.0};
}
produces
sub rsp, 24
vmovsd xmm0, QWORD PTR __real@4045000000000000
mov BYTE PTR $T1[rsp+8], 1
vmovups xmm1, XMMWORD PTR $T1[rsp]
vmovsd xmm1, xmm1, xmm0
vmovups XMMWORD PTR [rcx], xmm1
add rsp, 24
ret 0
Notice the unaligned mov operations. On the contrary, the function
void test_emplace(std::optional<double> & f){
f.emplace(42.0);
}
compiles to
mov rax, 4631107791820423168 ; 4045000000000000H
mov BYTE PTR [rcx+8], 1
mov QWORD PTR [rcx], rax
ret 0
This version is much simpler and faster.
These were generated using MSVC 19.32 with /O2 /std:c++17 /DNDEBUG /arch:AVX
.
clang 14 with -O3 -std=c++17 -DNDEBUG -mavx
produces
movabs rax, 4631107791820423168
mov qword ptr [rdi], rax
mov byte ptr [rdi + 8], 1
ret
in both cases.
Replacing std::optional<double>
with
struct MyOptional {
double d;
bool hasValue; // Required to reproduce the problem
MyOptional(double v) {
d = v;
}
void emplace(double v){
d = v;
}
};
exhibits the same issue. Apparently MSVC has some troubles with the additional bool
member.
See godbolt for a live example.
Why is MSVC producing these unaligned moves? I.e. the question is not why they are unaligned rather than aligned (which wouldn't improve things according to this post). But why does MSVC produce a considerably more expensive set of instructions in the assignment case? Is this simply a bug (or missed optimization opportunity) by MSVC? Or am I missing something?