Consider this small C++ code snippet:
#include <iostream>
#include <string>
int main() {
std::cout << std::string("This.String.Ends!") << std::endl;
}
A portion of the assembly generated by this snippet (compiled with clang++ -O3):
...
call std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long)
mov qword ptr [rsp + 16], rax
mov rcx, qword ptr [rsp + 8]
mov qword ptr [rsp + 32], rcx
movups xmm0, xmmword ptr [rip + .L.str]
movups xmmword ptr [rax], xmm0
mov byte ptr [rax + 16], 33 <--------- !!!!!!!!!
mov qword ptr [rsp + 24], rcx
mov rax, qword ptr [rsp + 16]
mov byte ptr [rax + rcx], 0
...
.L.str:
.asciz "This.String.Ends!"
Even though the string literal has the character '!'
at the end, the generated assembly has an additional instruction to add it explicitly. Questions:
- What is this optimization called? Is there a formal name for it?
- I am able to reproduce this behaviour with strings of size
8x+y
. I can imagine that fetching only one character from memory is expensive than using an additional instruction. Is that the case here? If so, why not inline the whole string (it's quite short in this case)? - What are the different ways in which I can still keep
-O3
, but avoid this particular optimization? From hit and trial, I could find that using a combination of these (withg++
) disables it:-fno-tree-ccp -fno-tree-dominator-opts -fno-tree-forwprop -fno-tree-fre -fno-code-hoisting -fno-tree-pre -fno-tree-vrp
, but I am guessing each one does something more which I probably don't want to miss out on. - If the compiler does generate an instruction for the last character, why still leave the complete string in
.rodata
section of the binary, and not just the starting 16 bytes? Doesn't it waste space?
My use case: This optimization creates a problem for patching string literals in binaries post compile+link (replacing 'x'
with 'y'
padded with \0*(len('x')-len('y'))
, assuming len('x') >= len('y')
). I know that there might be more optimizations of this kind which won't let me achieve this, but I just wanted to provide some context on how I hit this issue.