If you don't need to reference the labels from outside the %rep
block, the within-a-macro local %%label
syntax can work:
%macro jmpfwd 0
times 21 nop
jmp %%fwd ;;;;; <<<------ This jump
add ax, 0x1234 ; can this stall decoding?
; lea eax, [ebx+edx+1]
align 64
%%fwd: ;;;;; <<<------ jumps here
%endmacro
Then use that macro inside a %rep
.looptop:
%rep 4
jmpfwd
%endrep
; times 4 jmpfwd nope, TIMES only works on (pseudo)instructions, not macros
dec ecx
jnz .looptop
(Turns out, Skylake can decode this without LCP stalls every iteration, only a few LCP stalls when the add
hits the decoders in the same group as jmp
before branch prediction for the unconditional jmp
instructions take effect. The times 21 nop
prevents it from fitting in the uop cache.)