1

I am working on a parallel tree implementation. During profiling, I have recognized high bad speculation on movb operation.

My data structure looks as follows:

struct data {
    enum annotation_type : std::uint8_t { none, core, region, tree_node }; // Specifies the value in the union
    access_pattern _access_pattern = writing;
    enum priority _priority = priority::normal;

    annotation_type _annotation_type = annotation_type::none;
    union {
        std::uint16_t _core;
        std::uint8_t _region;
        std::pair<node*, std::uint16_t> _tree_node = {nullptr,0};
    };
};

During traversal of the tree, I will update the _annotation_type and the value of the union regularly using the following function:

void annotate(const node* node, const std::uint16_t size) noexcept
{
    _annotation_type = annotation_type::tree_node;
    _tree_node = {node, size};
}

By clang-10, the annotate method is compiled to

movb  $0x3, 0x12(%r14)   // Set _annotation_type
movq  %rax, 0x14(%r14)   // Set the pointer part of tree_node
movw  $0x200, 0x1c(%r14) // Set the size part of tree_node, in this case it is called with annotate(node*, 512)

Using Intel VTune Profiler, I can detemine a CPI Rate of 4.550 and a bad speculation of 53.0% for the movb instruction movb $0x3, 0x12(%r14) which updates the _annotation_type attribute to 0x3 (tree_node).

But how can a mov instruction get such a high bad speculation, which is based on branch misspredition, isn't it?

phuclv
  • 37,963
  • 15
  • 156
  • 475
jagemue
  • 363
  • 4
  • 16
  • Are you reading from an inactive union member at any time? – Ted Lyngmo May 19 '20 at 12:29
  • What is an inactive union? The only knowledge of what value is stored in the union is the `_annotation_type`. – jagemue May 19 '20 at 12:35
  • If you store something in `_tree_node` the [inactive union members](https://stackoverflow.com/questions/11373203/accessing-inactive-union-member-and-undefined-behavior) are `_core` and `_region`. – Ted Lyngmo May 19 '20 at 13:01
  • Btw, the `pair` stores a `node*` but the `annotate` (member?) function tries to store a `const node*`. Does that even compile? – Ted Lyngmo May 19 '20 at 13:15
  • 1
    @TedLyngmo Ah, no, I do not access one of the inactive union members. – jagemue May 19 '20 at 13:18
  • I vaguely remember that there is penalty when doing byte operations in some circumstances, but its more a read-write queue problem. – Surt May 19 '20 at 13:46
  • I note that the `std::pair _tree_node` appears to be 4 byte (not 8 byte) aligned ? – Chris Hall May 19 '20 at 14:16
  • Why would you write this yourself, rather than use `std::variant`? – EOF May 19 '20 at 15:03
  • @Surt: x86 byte and 16-bit operations sometimes can cause partial register penalties, but no partial registers are involved in that instruction, only an immediate and memory. Unlike some non-x86 microarchitectures, [x86 byte stores are just as efficient as dword stores](https://stackoverflow.com/questions/46721075/can-modern-x86-hardware-not-store-a-single-byte-to-memory). I'd guess the mis-speculation might be in whatever conditionally calls this function, not due to anything the store itself is doing. Loads can cause memory-order mis-speculation machine clears, but not stores. – Peter Cordes May 19 '20 at 17:46

0 Answers0