1

I'm trying to implement an atomic tagged/packed pointer, for the sake of learning.

I want to use the upper 16 bits for a uint16_t counter, and the lower 3 bits for a 3-bit tag.

So far, I've managed to get everything working except the ability to increment the counter. I'm not very familiar with bitwise operations, so I assume the mistake is probably somewhere in my use of them.

The implementation I have is below:

The current output is:

AtomicTaggedPointer(ptr=0x5589e2dc92b0, val=42, tag=4, count=0) 
0000000000000000 0101010110001001 1110001011011100 1001001010110 100

AtomicTaggedPointer(ptr=0x5589e2dca2e0, val=43, tag=5, count=0) 
0000000000000000 0101010110001001 1110001011011100 1010001011100 101

What I'm trying to achieve is that count=1 when it's printed the second time, and that we see that stored in the upper 16 bits.

#include <atomic>
#include <cassert>
#include <cstdint>
#include <cstdio>

// A word-aligned, atomic tagged pointer.
// Uses both the upper 16 bits for storage, and the lower 3 bits for tagging.
//
//   64                48                32                16
// 0xXXXXXXXXXXXXXXXX  0000000000000000  0000000000000000  0000000000000XXX
//   ^                 ^                                                ^
//   |                 |                                                +-- Tag (3 bits)
//   |                 +-- Pointer (48 bits)
//   +-- Counter (16 bits)
//
//
// The tag is 3 bits, allowing for up to 8 different tags.
//
// The version is incremented every time the pointer is CAS'd. This is useful
// for detecting concurrent modifications to a pointer.
template <typename T>
struct AtomicTaggedPointer
{
    static_assert(sizeof(T*) == 8, "T* must be 8 bytes");
    static_assert(sizeof(std::atomic<uintptr_t>) == 8, "uintptr_t must be 8 bytes");

  private:
    static constexpr uintptr_t kTagMask      = 0x7;                // 3 bits
    static constexpr uintptr_t kCounterMask  = 0xFFFF000000000000; // 16 bits
    static constexpr uintptr_t kPointerMask  = ~kTagMask;          // All bits except tag bits
    static constexpr uintptr_t kCounterShift = 48;                 // Shift counter bits to the left

    std::atomic<uintptr_t> value;

  public:
    AtomicTaggedPointer(T* ptr, uint8_t tag = 0)
    {
        value.store(reinterpret_cast<uintptr_t>(ptr) | tag, std::memory_order_relaxed);
    }

    T* get() const
    {
        return reinterpret_cast<T*>(value.load(std::memory_order_relaxed) & kPointerMask);
    }

    uint8_t tag() const
    {
        return value.load(std::memory_order_relaxed) & kTagMask;
    }

    uint16_t counter() const
    {
        return value.load(std::memory_order_relaxed) >> kCounterShift;
    }

    // Compare and swap the pointer with the desired value, and optionally set the tag.
    // Returns true if the swap was successful.
    // Also increments the version counter by 1.
    bool cas(T* desired, uint8_t tag = 0)
    {
        uintptr_t expected = value.load(std::memory_order_relaxed);
        uintptr_t desired_value =
            reinterpret_cast<uintptr_t>(desired) | (tag & kTagMask) | ((expected + 1) & kCounterMask) << 48;
        return value.compare_exchange_strong(expected, desired_value, std::memory_order_relaxed);
    }

    void print() const
    {
        printf("AtomicTaggedPointer(ptr=%p, val=%d, tag=%hhu, count=%hu) \n", get(), *get(), tag(), counter());
        // Print each bit of the pointer's 64-bit value
        // In the format:
        // 0xXXXXXXXXXXXXXXXX  0000000000000000  0000000000000000  0000000000000XXX
        uintptr_t v = value.load(std::memory_order_relaxed);
        for (int i = 63; i >= 0; i--)
        {
            if (i == 2 || i == 15 || i == 31 || i == 47)
            {
                printf(" ");
            }
            printf("%lu", (v >> i) & 1);
        }
        printf("\n");
    }
};

int
main()
{
    AtomicTaggedPointer<int> p = AtomicTaggedPointer<int>(new int(42), 4);
    p.print();
    assert(p.get() != nullptr);
    assert(*p.get() == 42);
    assert(p.tag() == 4);
    assert(p.counter() == 0);

    int* expected = p.get();
    p.cas(new int(43), 5);
    p.print();
}
Gavin Ray
  • 595
  • 1
  • 3
  • 10
  • 1
    "*I want to use the upper 16 bits for a uint16_t counter, and the lower 3 bits for a 3-bit tag.*" That's only 19 bits. – Nicol Bolas Dec 18 '22 at 20:57
  • 1
    `cas`: This operation is not atomic, and as you've spelled it out, it is unlikely to be *able* to be atomic. That's kind of dangerous. – Nicol Bolas Dec 18 '22 at 20:59
  • The other 48 bits are taken up by the actual pointer, you can't use them. The upper 16 bits are available because of 4-level page table address translation (Intel has an extension for 5-level paging which removes some of the bits), and the lower 3 bits are available on word-aligned addresses, as they're always set to 0. See: https://en.wikipedia.org/wiki/Tagged_pointer#Folding_tags_into_the_pointer – Gavin Ray Dec 18 '22 at 21:00
  • 1
    So what happens if someone gives your code a pointer to a `char`? Or some other pointer value that is not 8-byte aligned? – Nicol Bolas Dec 18 '22 at 21:04
  • 1
    How is it not atomic? With "-O" the ".cas()" is a "cmpxchg" instruction, it becomes "lock cmpxchg qword ptr [rsp], rcx" – Gavin Ray Dec 18 '22 at 21:07
  • The static_assert will fail and the code will not compile if it's given a non-aligned pointer – Gavin Ray Dec 18 '22 at 21:07
  • "*How is it not atomic?*" Two atomic operations in sequence are not, in aggregate, atomic. That your compiler *may* compile them into an atomic operation does not mean that they are atomic as far as C++ (aka: any other compiler) is concerned. – Nicol Bolas Dec 18 '22 at 21:08
  • I would appreciate if someone would just tell me how to increment the counter instead of trying to tell me why my code is not good lol – Gavin Ray Dec 18 '22 at 21:08
  • 1
    "*The static_assert will fail and the code will not compile if it's given a non-aligned pointer*" Which `static_assert`? I'm talking about the *value* of a pointer, not the *size* of the pointer. You cannot `static_assert` on a runtime pointer value. – Nicol Bolas Dec 18 '22 at 21:10
  • @user17732522 Do you want to submit that as an answer? I'll accept it -- thank you! – Gavin Ray Dec 18 '22 at 21:17
  • 1
    @NicolBolas: a CAS retry loop can synthesize any atomic operation on a single variable, including replacing two bitfields. It's weird that the function itself is called `cas()` and makes only one attempt to compare-and-swap, but there's no fundamental problem with the idea. It's the same as using a struct of two variables for an ABA counter, except only needs one pointer-width object so you can get efficient atomic loads on more machines without [wrestling with the compiler to only load the pointer](//stackoverflow.com/q/38984153), not also the counter, if you're just reading not CASing. – Peter Cordes Dec 18 '22 at 22:06
  • https://en.wikipedia.org/wiki/Compare-and-swap . For an example of synthesizing an atomic operation that isn't directly supported by hardware, @NicolBolas, see [Atomically clearing lowest non-zero bit of an unsigned integer](https://stackoverflow.com/q/51346815) for a working example and the resulting x86-64 asm. The code in the question is correct. Same for [Atomic double floating point or SSE/AVX vector load/store on x86\_64](https://stackoverflow.com/q/45055402) - `atomic` under the hood uses CAS. – Peter Cordes Dec 18 '22 at 22:12

1 Answers1

2

You increment expected by adding 1. That would be an increment at the lowest bit. But that is where you put the tag. The counter is at the highest bits. So you need to shift 1 first to the lowest counter bit, i.e. by kCounterShift (and to do so you should first cast 1 to the appropriate type uintptr_t, so that the shift is sure to be in-range).

Also, your kPointerMask is wrong as it doesn't mask the counter bits.

Also, to make sure that the pointers are suitably aligned you should add a static assert on alignof(T) being large enough. Then you should be fine, since it is not possible to pass valid unaligned pointer values for a type in standard C++ and even on platforms where unaligned access is allowed, it is not allowed to pass and dereference incorrectly aligned pointers like this. (see comments under the answer)

In your specific example int is not going to satisfy that requirement. It will have an alignment of only 4 bytes, not the 8 bytes you need. That you are using new to create the objects is probably saving you from getting pointers that are not 8 byte aligned, but even for new int this is not guaranteed in general.

Also, of course a lot here is implementation-defined behavior. For example the layout of the bits in the pointer used and how they map to uintptr_t is specific to x86-64 and a typical ABI. I could imagine a compiler using the unused bits in the pointer itself for some purpose or similar.

There might also be a broader question of whether the whole approach is compatible with pointer provenance, which I am not sure about.

user17732522
  • 53,019
  • 2
  • 56
  • 105
  • Thank you! This sounds right to me, I've tried this however: uintptr_t desired_value = (expected & kPointerMask) | ((expected + (1 << kCounterShift)) & kCounterMask) | (tag & kTagMask); and it does set the other values but for some reason not the counter – Gavin Ray Dec 18 '22 at 21:34
  • @GavinRay The type of `1` is wrong. Cast it to `uintptr_t` first. – user17732522 Dec 18 '22 at 21:36
  • That gives "Program returned: 139" so the masks may actually be wrong that's it using, hmm =/ – Gavin Ray Dec 18 '22 at 21:37
  • @GavinRay Because your `kPointerMask` is wrong. It doesn't take into account the counter bits. – user17732522 Dec 18 '22 at 21:43
  • Ohhhh, that would explain a lot... – Gavin Ray Dec 18 '22 at 21:45
  • 1
    @GavinRay: `1ULL << 48` is the standard way to write stuff like that. If `unsigned long long` is wider than `uintptr_t`, that's fine: it's a compile-time constant value so the compiler can still see what's going on. – Peter Cordes Dec 18 '22 at 22:09
  • 1
    user17732522 (and @GavinRay): You shouldn't be using `T*` that doesn't satisfy `alignof(T)`. It's undefined behaviour to dereference, and can break in practice even on ISAs that allow unaligned loads in asm; see [Why does unaligned access to mmap'ed memory sometimes segfault on AMD64?](https://stackoverflow.com/q/47510783) for one case and links to a couple blogs about others. Use `memcpy` or GNU C `typdef uint64_t unaligned_u64 __attribute__((aligned(1), may_alias))` to write safe/correct code for unaligned loads. So using the low 3 bits of pointers should be safe with `alignof(T) >= 8`. – Peter Cordes Dec 18 '22 at 22:18
  • @PeterCordes There would be a difference if the shift is further than the width of `unsigned long long` which I guess is in theory is allowed (though not for 64bit or below and doesn't apply with the address layout in the question anyway). I just like to cast everything to the target type when doing operations like this. (Where by cast I mean something like `uintptr_t{1}`, not a C style cast.) – user17732522 Dec 18 '22 at 22:28
  • @PeterCordes Fair enough about passing unaligned pointers. I wasn't sure whether there is any ABI/compiler where it is allowed to pass unaligned pointers like this. – user17732522 Dec 18 '22 at 22:29
  • Oh, that reminds me, MSVC's `alignof(T)` violates the standard in some cases, IIRC. ISO C++ says `alignof(T)` is supposed to be the minimum alignment for any possible T, otherwise undefined behaviour. But MSVC has `alignof(uint64_t) == 8`, but in 32-bit code doesn't always align it that much: [Why is the "alignment" the same on 32-bit and 64-bit systems?](https://stackoverflow.com/q/55920103) . (Even without taking pointers to members of `packed` structs, which can also be alignment UB if the access doesn't go through a struct object.) – Peter Cordes Dec 18 '22 at 22:47
  • So yeah, part of the contract of this class is that the pointer type must be aligned by 8 at least, and it's not a bad idea to check that with an assert for debug builds, in case programs accidentally have misaligned `T*` that otherwise happens to work (because that is often the case on ISAs where unaligned loads just work in asm; only compiler optimizations / assumptions rely on alignment on targets like that). – Peter Cordes Dec 18 '22 at 22:50
  • @PeterCordes: "*it is not allowed to pass and dereference incorrectly aligned pointers like this.*" That's the problem: there actually *aren't* that many types `T` which are explicitly aligned to 8 bytes. Obviously any type holding a pointer/reference or having a vtable has to be, but there are a *lot* of types that don't. Indeed, the example presented in the question *would not work*, since `int` isn't required to be aligned to 8 bytes. You couldn't even use a `void*`. It seems to me that this needs to be a runtime check, not a compile-time one. – Nicol Bolas Dec 18 '22 at 23:34
  • @NicolBolas: Yes, agreed, runtime checks are needed in a debug build. A `static_assert( alignof(T) > =8 )` would be a good idea, unless the OP wants to use it for `int*` to over-aligned allocations, like you'd get from `malloc` (or `new` on most C++ implementations). My earlier reply was just addressing now-changed text in this answer that talked about code using misaligned `T*` on x86-64 and getting away with it, pointing out that's already broken in practice on real compilers, and thus not a kind of rule-bending people should be doing. – Peter Cordes Dec 19 '22 at 01:05
  • Re: making assumptions about uintptr_t object-representation vs. `T*`; using pointer bits only works if you know about the ISA's pointer representation. e.g. that you're on x86-64 without PML5 enabled (57-bit virtual addresses). So this is not intended to be portable in ISO C++ in general, although in practice it will be for a fair few 64-bit ISAs if you do it right. – Peter Cordes Dec 19 '22 at 01:07