4

I have MASM synchronizing code for an application which runs on both Intel and AMD x86 machines.

I'd like to enhance it using the Intel TSX prefixes, specifically XACQUIRE and XRELEASE.

If I modify my code correctly for Intel, what will happen when I attempt to run it on AMD machines? Intel says that these were designed to be backwards compatible, presumably meaning they do nothing on Intel CPUs without TSX.

I know that AMD has not implemented TSX. But are these prefixes safe to run on AMD CPUs? Is this behavior documented in the AMD manuals somewhere or is it playing with fire to assume this is safe and will always be safe?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341

1 Answers1

6

xacquire/xrelease are just F2/F3 REP prefixes and are safely ignored by all CPUs that don't support that feature, including non-Intel. That's why Intel chose that encoding for the prefixes. It's even better than a NOP that has to decode as a separate instruction.

In general (across vendors), CPUs ignore REP prefixes they don't understand. So new extensions can use REP as part of their encoding if it's useful for them to decode as something else on old CPUs, instead of #UD.

I don't think it's plausible for AMD to introduce an incompatible meaning for rep prefixes on locked instructions or mov-stores - that would break real-world binaries that already uses these prefixes. For example I'm pretty sure some builds of libpthread in mainstream GNU/Linux distros have used this to enable hardware lock elision, and don't use dynamic CPU dispatching to run different code based on CPUID for this.


Using REP as a mandatory prefix for a backwards-compat new instruction has been done before, e.g. with rep nop = pause or rep bsf = tzcnt. (Useful for compilers because tzcnt is faster on some CPUs, and gives the same result if the input is known non-zero.) And rep ret as a workaround for AMD pre-Bulldozer branch predictors is widely used by GCC - What does `rep ret` mean?. That meaningless REP definitely works (silently ignored) in practice on AMD.

(The reverse is not true. You can't write software that counts on a meaningless REP prefix being ignored by future CPUs. Some later extension might give it a meaning, e.g. like with rep bsr which runs as lzcnt and gives a different result. This is why Intel documents the effect of meaningless prefixes as "undefined".)


I'd like to enhance it using the Intel TSX prefixes, specifically XACQUIRE and XRELEASE.

Unfortunately microcode updates have apparently disabled the HLE (Hardware Lock Elision) part of TSX on all Intel CPUs. (Perhaps to mitigate TAA side-channel attacks). This was the same update that made jcc at the end of a 32-byte block be uncacheable in the uop cache, so it's hard to tell from benchmarking existing code what perf impact the no-HLE part has.

https://news.ycombinator.com/item?id=21533791 / Has Hardware Lock Elision gone forever due to Spectre Mitigation? (yes gone, but no the reason probably isn't Spectre specifically. IDK if it will be back.)

If you want to use hardware transactional memory on x86, I think your only option is RTM (xbegin/xend), the other half of TSX. OSes can disable it, too, after the most recent microcode update; I'm not sure what the default is for typical systems, and this may change in the future, so this is something to check on before putting development time into anything.

There isn't AFAIK a way to use RTM but transparently fall back to locking; xbegin / xend are illegal instructions that fault with #UD if the CPUID feature bit isn't present.

If you wanted transparent backwards compat, you were supposed to use HLE so it's a real shame that it (and TSX in general) has had such a rough time, repeatedly getting disabled by microcode updates. (Previously in Haswell and Broadwell because of possible correctness bugs. It's turning into a Charlie Brown situation.)

Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 2
    I figured it was likely you who would answer :-} "microcode updates have apparently disabled the HLE" Really? Kind of make this exercise pointless. Are the RTM primitives also "safe" to execute on the AMD hardware? I don't see how that can work considering one them contains a branch offset. But I'd be happy to hear your response. – Ira Baxter Apr 20 '20 at 14:27
  • 1
    @IraBaxter: I haven't checked if it's possible for an OS or hypervisor to still enable HLE if they want to mitigate TAA attacks some other way, e.g. by disabling hyperthreading or only scheduling threads from the same process or user on the same phys core, and using some kind of kernel mitigation. TSX seems to be the most hard-luck story of any x86 tech; keeps getting disabled by microcode updates after bugs are found, first in Haswell, then again in early Broadwell, and not yet again because of a security bug. IDK how practical or serious the exploit is; I haven't looked at it. – Peter Cordes Apr 20 '20 at 18:32
  • 1
    And yes, RTM is not transparently backwards compatible, unfortunately. You do have to check for feature support. https://www.felixcloutier.com/x86/xbegin is C7 F8, and the manual says `#UD` if `CPUID.(EAX=7, ECX=0):EBX.RTM[bit 11] = 0` – Peter Cordes Apr 20 '20 at 18:33
  • 2
    Regarding HLE existence in future. I was pointed to [Intel® 64 and IA-32 Architectures Software Developer’s Manual](https://software.intel.com/content/dam/develop/public/us/en/documents/325462-sdm-vol-1-2abcd-3abcd.pdf). _2.5 INTEL INSTRUCTION SET ARCHITECTURE AND FEATURES REMOVED_ lists HLE as removed since 2019 (_This section lists Intel ISA and features that Intel has already removed for select upcoming products._) – Alex Guteniev Jun 16 '20 at 03:55
  • @PeterCordes: Given 1.5 years have elapsed since your answer, can you update your answer about the statuses of microcode disable patches? Is AMD ever going to try to implement these? – Ira Baxter Dec 27 '21 at 04:07
  • @IraBaxter: It's not something I follow news about, and haven't been doing anything using TSX myself. If I get around to it, I'll update, but if anyone else wants to do some research and update this, they're welcome to. – Peter Cordes Dec 27 '21 at 04:34