The following code
#include <iostream>
void foo() {
std::cout << ' ';
}
void bar() {
std::cout << " ";
}
produces the following output in g++ 10.2 with -O3 option:
foo():
sub rsp, 24
mov edx, 1
mov edi, OFFSET FLAT:_ZSt4cout
lea rsi, [rsp+15]
mov BYTE PTR [rsp+15], 32
call std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
add rsp, 24
ret
.LC0:
.string " "
bar():
mov edx, 1
mov esi, OFFSET FLAT:.LC0
mov edi, OFFSET FLAT:_ZSt4cout
jmp std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
_GLOBAL__sub_I_foo():
sub rsp, 8
mov edi, OFFSET FLAT:_ZStL8__ioinit
call std::ios_base::Init::Init() [complete object constructor]
mov edx, OFFSET FLAT:__dso_handle
mov esi, OFFSET FLAT:_ZStL8__ioinit
mov edi, OFFSET FLAT:_ZNSt8ios_base4InitD1Ev
add rsp, 8
jmp __cxa_atexit
Here we can see that in both cases std::__ostream_insert
function is called, but in 2 different ways: using call and using jmp. In the first case, the space symbol is written to stack mov BYTE PTR [rsp+15], 32
and then the function is called on this address. Just because this symbol is written to the stack, the space on it must be previously allocated and, later, deallocated. So that's why the call
command is used in the first case, instead of lighter jmp
: we have to clear the stack after the call: add rsp, 24
. So, contrary to expectations, printing a symbol takes more time than printing a string literal.
Why does this happen? Why the symbol is not stored in memory? Why the optimizer haven't chosen char-specific function to call?