Use perf record
(or other way of accessing HW perf counters) for an event like rtm_retired.aborted
for any aborts, and/or tx_mem.abort_conflict
or tx_mem.abort_capacity
to see if either of those are the cause for aborts. (You can record multiple events in one run, then see which fired in perf report
)
Also tx_exec.misc1..3 might be relevant. From perf list
on my Skylake desktop.
tx_exec.misc1
[Counts the number of times a class of instructions that may cause a
transactional abort was executed. Since this is the count of
execution, it may not always cause a transactional abort]
tx_exec.misc2
[Counts the number of times a class of instructions (e.g., vzeroupper)
that may cause a transactional abort was executed inside a
transactional region]
tx_exec.misc3
[Counts the number of times an instruction execution caused the
transactional nest count supported to be exceeded]
See also https://oprofile.sourceforge.io/docs/intel-skylake-events.php
You might need to tweak things to get a reasonable number of samples for an event that doesn't fire very often. I haven't tried this, but hopefully the counts should show up on the guilty instruction itself. rtm_retired.aborted
is a precise event; the others don't say so in perf list
output.
Some of the RTM/TSX events are only for HLE (Hardware Lock Ellision, where you put an extra prefix on a lock
ed instruction).
Use perf list
and search for "abort" in the output to find relevant events.