No and it doesn't make sense to have a nop
operation on the high level that maps to an assembly nop
. Why? Because of the as if rule which states that the compiler is "required to emulate (only) the observable behavior". By definition a nop
operation doesn't have any kind of observable behavior (on the "the abstract machine" as defined in the standard).
Going to more practical aspects, the assembly instructions generated from a C++ source file don't have a 1 to 1 relation to the C++ instructions. Most often a C++ instruction is composed by multiple assembly instructions, multiple C++ instructions are replaced by one assembly instruction or even some C++ instructions don't have any kind of assembly counter-part due algorithm transformations or dead code elimination and on top of that instructions are rearranged and algorithms transformed all over the place.
For instance
One of the most radical example I can think of algorithm transformation a compiler can do is converting a recursive function to a simple iterative loop:
auto sum(int* v, int len)
{
if (len == 0)
return 0;
return v[0] + sum(v + 1, len - 1);
}
A simple recursive function to compute the sum of elements of a vector.
This is what clang
generates with -O1
:
sum(int*, int): # @sum(int*, int)
xor eax, eax
test esi, esi
je .LBB0_2
.LBB0_1: # =>This Inner Loop Header: Depth=1
add eax, dword ptr [rdi]
add rdi, 4
add esi, -1
jne .LBB0_1
.LBB0_2:
ret
A clever algorithm transformation from recursion to a loop. Where would a nop
C++ instruction fit in the assembly?
How about in the -O3
generated assembly?:
sum(int*, int): # @sum(int*, int)
test esi, esi
je .LBB0_1
lea edx, [rsi - 1]
add rdx, 1
xor eax, eax
cmp rdx, 8
jae .LBB0_4
mov rcx, rdi
jmp .LBB0_7
.LBB0_1:
xor eax, eax
ret
.LBB0_4:
mov r8d, esi
and r8d, 7
sub rdx, r8
sub esi, edx
lea rcx, [rdi + 4*rdx]
add rdi, 16
pxor xmm0, xmm0
pxor xmm1, xmm1
.LBB0_5: # =>This Inner Loop Header: Depth=1
movdqu xmm2, xmmword ptr [rdi - 16]
paddd xmm0, xmm2
movdqu xmm2, xmmword ptr [rdi]
paddd xmm1, xmm2
add rdi, 32
add rdx, -8
jne .LBB0_5
paddd xmm1, xmm0
pshufd xmm0, xmm1, 78 # xmm0 = xmm1[2,3,0,1]
paddd xmm0, xmm1
pshufd xmm1, xmm0, 229 # xmm1 = xmm0[1,1,2,3]
paddd xmm1, xmm0
movd eax, xmm1
test r8d, r8d
je .LBB0_8
.LBB0_7: # =>This Inner Loop Header: Depth=1
add eax, dword ptr [rcx]
add rcx, 4
add esi, -1
jne .LBB0_7
.LBB0_8:
ret
Here the compiler does some loop unrolling. The loop is now transformed into a header where it gets all elements up to a multiple, a main body where aligned groups of elements are added simultaneously using vectorisation, and a tail where it gets the rest of the elements that could not fill an aligned group. And ... the funny thing ... the original C++ source didn't even have a loop. So if you had a nop
in the source code where would you put it in the assembly? It has no observable effect so there is no reasoning that the compiler can leverage to figure out where to put it in this heavily transformed code.
And even if you thought of some clever rules and managed to specify where the C++ nop would map to assembly how useful would that be? An assembly nop
has purposes that do not pertain to the program algorithm, but to the architecture implementation details, like RAW dependencies and so on. With C++ you don't model architecture details (at least not in standard C++), but architecture agnostic algorithms, so you can't have an instructions that is exclusively related to architecture detail.
The thing is that C++ does have a nop
instruction. It is the empty statement ;;
. And it can useful on a C++ syntax and semantics level:
// find the end of the string:
for (const ch* end = str; *ch != '\0'; ++end)
; // <-- empty statement. A nop instruction.
But because of the aforementioned rule and resonings it doesn't make sense to generate an assembly nop
instruction.