When going the route of interlocked operation on dummy location, there are few things to consider:
- Being in L1d of this core,
- Being not used by other cores
- Not creating long dependency chains
- Avoid stall due to store-forwarding miss
Without the context, anything is only a guess, so the goal is to make a best guess.
A place near top of stack is a good guess for 1 and 2.
Deliberately allocated stack variable is likely to fix 3, and as there isn't other stores in flight, 4 is not a problem. The best operation looks like lock not
.
Not allocating stack variable requires the operation to be effectively no-op, so lock or [mem], 0
is a good option. Operand should be byte to avoid problems with 4. For 3, it is always a guess. (Although return address could have been used, assembly without the context does not know it. But MSVC _AddressOfReturnAddress
may be a good idea)
I've read about red zone. Absence of it on Windows enable extra optimizations.
lock not byte ptr [esp-1]
without extra variable is good on Windows, since the data is considered volatile an should not be used. There are no spilled registers, so no false data dependency.
ABI with 128 bytes red zone preclude the use of lock not byte ptr [esp-1]
. 128 bytes beyond the stack is likely enough to be not L1d. Still, since red zone not that much likely to be used as the usual stack, the answer given by @Peter Cordes looks good.
TSX is primarily questionable due to its absence (unsupported on a given CPU, or disabled as a result of errata fix or security mitigation). Only RTM will exist in foreseen future (Has Hardware Lock Elision gone forever due to Spectre Mitigation?). According to RTM overview, an empty RTM transaction is still a fence, so it can be used.
A successfully committed RTM region consisting of an XBEGIN followed by an XEND, even with no memory operations in the RTM region, has the same ordering semantics as a LOCK prefixed instruction.
Beware of failed transactions or unsupported RTM. Pseudocode seem to be as follows:
if (rtm_supported && _xbegin() == 0xFFFFFFFF)
_xend();
else
dummy_interlocked_op();