Nothing happens with RCX=0; rep
prefixes do check for zero first like the pseudocode says. (Unlike the loop
instruction which is exactly like the bottom of a do{}while(--ecx)
, or a dec rcx
/jnz
but without affecting FLAGS.)
I think I've heard of this rarely being used as an idiom for a conditional load or store with rep lodsw
or rep stosw
with a count of 0 or 1, especially in the bad old days before cmov. (cmov
is an unconditional load feeding an ALU select operation, so it needs a valid address, unlike rep lods
with a count of zero.) This is not efficient especially for rep stos
on modern x86 with Fast Strings microcode (P6 and later), especially without anything like Fast Short Rep-Movs (Ice Lake IIRC.)
The same applies for instructions that treat the prefixes as repz
/ repnz
(cmps/scas) instead of unconditional rep
(lods/stos/movs). Doing zero iterations means they leave FLAGS umodified.
If you want to check FLAGS after a repe/ne cmps/scas
, you need to make sure the count was non-zero, or that FLAGS was already set such that you'll branch in a useful way for zero-length buffers. (Perhaps from xor-zeroing a register that you're going to want later.)
rep movs
and rep stos
have fast-strings microcode on CPUs since P6, but the startup overhead makes them rarely worth it, especially when sizes can be short and/or data might be misaligned. They're more useful in kernel code where you can't freely use XMM registers. Some recent CPUs like Ice Lake have fast-short-rep microcode that I think is supposed to reduce startup overhead for small counts.
repe/ne scas/cmps
do not have fast-strings microcode on most CPUs, only on very recent CPUs like Sapphire Rapids and maybe Alder Lake P-cores. So they're quite slow, like one load per clock cycle (so 2 cycles per count for cmpsb/w/d/q
) according to testing by https://agner.org/optimize/ and https://uops.info/.