5

As is well known, Intel had to disable TSX in the Haswell-series of processors via a microcode updates. This was due to a bug in the TSX implementation that could give erroneous results if these instructions were used.

What seems to be less well known is that there is apparently also an errata affecting TSX on the newer architecture, Skylake. Specifically the errata "SKL-105" mentioned here:

http://www.intel.com/content/www/us/en/processors/core/desktop-6th-gen-core-family-spec-update.html

It specifically states that using TSX can lead to unpredictable system behavior. However, it also notes that it is possible for the BIOS to carry a fix. However, the question is then what this fix entails. Does it disable TSX altogether like the Haswell microcode "fix"? Googling "SKL105" gives no results so it seems the community is generally unaware of it?

Some users have noticed the TSX feature getting "steathily" disabled (but seemingly being unaware of the errata above):

https://www.reddit.com/r/hardware/comments/44k218/intel_disables_tsx_transactional_memory_again_in/

It is strange if only certain variants of the CPUs are affected, since one would presume they would all share the same microarchitecture and hence be equally affected by this bug.

By the way another way such a microcode "fix" could operate and which could be even more stealthy: I suppose it would be possible to make a microcode update that would still expose the presence of TSX (making it seem the feature was still enabled) but would override the implementation of the new TSX instructions with "dummy implementations" that actually would never elide the locks and essentially just execute the code the old-fashioned way, thereby avoiding the bug but also foregoing the performance improvement TSX could offer. The only way to determine if this happened would be through performance measurements.

Anyone has more info on the status of TSX in Skylake? In any case it is strange that not more info is released and one has to guess what is affected and what is not. And indeed if the feature is safe to use.

I have a 6700K and the feature is still there. But this also depends on whether the BIOS manufacturer has taken in the microcode updates and also I haven't actually measured the performance so I can't exclude it could still have been disabled cf. the previous paragraph.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Morty
  • 1,706
  • 1
  • 12
  • 25
  • 1
    By the way, note that SKL-054 (in the same errata sheet) also involves TSX. The same status/comments/questions in my post for SKL-105 applies to SKL-054 as well. – Morty Aug 10 '16 at 12:11
  • Seems HLE has now been completely disabled due to new errata: https://news.ycombinator.com/item?id=21533791. It specifically seems XACQUIRE etc. are still present but will always behave as if eliding the lock failed (i.e. no performance advantage) essentially what I was hypothesizing could happen in this post 3 years ago. Intel must have had some chicken bits ;-) – Morty Dec 13 '19 at 19:38

1 Answers1

6

As far as I know, it is supposedly fixed on the latest public microcode update pack from 2016-07-14. For Skylake, this would be revision 0x9d/0x9e of the Skylake base microcode (processor signatures 0x406e3 and 0x506e3).

This new TSX erratum seems to be present on Broadwell, too. I assume it was also fixed through the new batch of Broadwell-* microcode updates that were published along with the new Skylake microcode updates.

For Linux, which updates microcode through data sent by the bootloader, it is trivial to apply the update and it is already available in most (serious) distributions. For Windows, you need to pester your system vendor for an EFI/BIOS update.

Sorry, I don't have the means to test TSX in latest Skylake/Broadwell microcode to check whether it is eliding locks or "always failing". As for disabling TSX, you must understand it has a real impact in L3 effectiveness (it does not come for free!) and power consumption, it would make a lot of sense to have TSX disabled by the BIOS on anything with a smaller L3.

Interestingly enough, the information on the TSX "chicken bit" is not public, we have no idea on how to disable (or re-enable) it.

anonymous
  • 141
  • 2
  • 1
    Do you have any references for TSX using power when enabled, even if not actually used? And also for reducing L3 effectiveness? I'd like to read more. – Peter Cordes Aug 12 '16 at 16:02
  • The papers on how TSX works make it clear it partitions the cache to preserve the transaction state in order to do the rollback. This is not a problem anyone outside of the most agressive HPC would ever notice on the Xeons, but the lesser chips have a lot less cache. The extra power use by the feature itself is likely too small to bother, but during the transaction, none of the low-power modes that evict cache can trigger, so either Intel has to abort the transaction, or block the mode change. I don't have the references at hand, look for TSX papers and also a SGX security analysis paper... – anonymous Aug 12 '16 at 20:04
  • 3
    From what I've read, it happens inside the L1 of the core doing a transaction. According to David Kanter's [HSW writeup](http://www.realworldtech.com/haswell-cpu/5/), even L2 isn't transactional, let alone L3. He says that his speculation about how Haswell would implement it was correct; that it uses extra bits for each L1 cache line. (See his previous articles: http://www.realworldtech.com/haswell-tm/ and http://www.realworldtech.com/haswell-tm-alt/). Kanter says that TM is one of his major professional interests, so there probably aren't major errors in any of that. – Peter Cordes Aug 13 '16 at 02:17
  • Skylake might use a different implementation of TSX, but given that HSW and BDW had bugs in the implementation, they were probably still more concerned with getting a correct implementation onto developer's desktops. (Although Kanter guessed in Aug 2012 that a HSW successor like Skylake might use the memory-order-buffer technique as well as the L1-cache based method.) Either way, those make a lot more sense than anything involving L3 and rolling back multiple levels of cache. – Peter Cordes Aug 13 '16 at 02:20
  • Thanks for the details. Reading the sources you mentioned, I have to agree the transactions are implemented in L1, not L2/L3. I wonder how much this changes any assumptions made re. the cost of running with TSX enabled in cache performance and power management constraints... – anonymous Aug 15 '16 at 11:12
  • 1
    Intel probably assumes that most CPUs they sell won't be using the new feature, and would design accordingly: very little overhead when enabled but not in use. When in use, the extra storage required for metadata doesn't compete with normal data for regular L1 cache lines; instead each line has some extra bits to track its status. L1 lines already had extra bits for tracking HW virtualization ownership (so caches don't have to be flushed on every VM exit). – Peter Cordes Aug 15 '16 at 15:35
  • 3
    [This paper](http://arxiv.org/pdf/1504.04640.pdf) touches some TSX performance details, and has some interesting conclusions, including read-only transactions with the full L3 size. It has interesting papers about TSX in the References section... – anonymous Aug 15 '16 at 17:21