3

Need this call to implement a lock-free linked list. An AtomicMarkableReference is an object from the java.util.concurrent.atomic package that encapsulates both a reference to an object of type T and a Boolean mark. These fields can be updated atomically,either together or individually.

Thank You.

Muhammad Safi
  • 363
  • 1
  • 4
  • 16
  • I do not think there is a exact equivalent. Wouldn't it enough to use a atomic of a pointer and use the nullptr as the boolean false? – J. Calleja Oct 25 '16 at 19:41
  • Can you please elaborate? I need an instruction which can set a reference AND a boolean at the same time. – Muhammad Safi Oct 25 '16 at 23:24
  • You can use `compare_exchange_weak` on a struct/class with a pointer and a boolean member. (I'd suggest storing the boolean in a `uintptr_t` to make sure it's the same size as the pointer, so you don't have padding bytes as part of your objects). See [this answer](http://stackoverflow.com/questions/38984153/implement-aba-counter-with-c11-cas/38991835#38991835) for how to get compilers to make efficient code for `compare_exchange_weak` on a `std::atomic` that's the size of two pointers. – Peter Cordes Oct 26 '16 at 02:50

2 Answers2

5

Assuming object's alignement be more than 1, you may use the last bit of the pointer as a boolean mark:

template<class T>
class MarkableReference
{
private:
    uintptr_t val;
    static const uintptr_t mask = 1;
public:
    MarkableReference(T* ref = NULL, bool mark = false)
    {
        val = ((uintptr_t)ref & ~mask) | (mark ? 1 : 0);
    }
    T* getRef(void)
    {
        return (T*)(val & ~mask);
    }
    bool getMark(void)
    {
        return (val & mask);
    }
};

For perform atomic operations, you need to create atomic variable from this class. E.g., if type of object, pointed by a reference, should be int, you may create this variable:

atomic<MarkableReference<int>> mRef;

For variable mRef you may apply any operation, which is applied for normal atomic. For example, Compare-and-Set (CAS):

int a;
int b;
...
// Atomically change pair (&a, true) to (&b, false).
MarkableReference<int> old = mRef.load();
if(old == MarkableReference(&a, true))
{
    if(mRef.compare_exchange_strong(old, MarkableReference(&b, false)))
    {
        // Successful CAS
    }
}
// 'old' contains other value. (Unsuccessfull CAS)
Tsyvarev
  • 60,011
  • 17
  • 110
  • 153
  • Hey. Thanks for your answer. I'm sort of new to CPP so having a little bit of trouble. Can you tell me how I can adapt this to a situation where I have a Node class, and the next field of the node is a markable reference containing 1) ptr to another node, 2) A BOOLEAN FLAG I want to be able to atomically change the reference AND the BOOLEAN flag to true/false using compareAndSet operations. – Muhammad Safi Oct 26 '16 at 02:35
  • See also http://stackoverflow.com/questions/38984153/implement-aba-counter-with-c11-cas/38991835#38991835 for cmpxchg on a two-member struct using portable c++11 std::atomic, in case you need to use this with pointers that have all their bits significant (e.g. `char*`). – Peter Cordes Oct 26 '16 at 02:55
  • On x86-64, you could use this with unaligned pointers by storing the boolean somewhere above bit 48 of the address (instead of the low bit), because x86-64 currently requires that the high bits of the address are sign-extended from the low 48. (The MSB would be a good choice, since you could re-generate it with a left shift and then an arithmetic right shift.) – Peter Cordes Oct 26 '16 at 02:56
  • @IrtazaSafi: This answer is missing an implementation of `setMark`. I think probably the `uintptr_t val;` should be atomic, instead of using `atomic>`. That would let member functions do atomic AND/OR/XOR operations on the `val` member to modify the boolean atomically. – Peter Cordes Oct 26 '16 at 02:58
  • Alright so if my use case is this struct AtomicReference { Node * next bool flag } I have let's say I have let's say atomic a a.next = 0x100 a.flag = false atomic b b.next = 0x110 b.flag = true atomicc c.next = 0x100 c.flag = false Now I wanna do a CAS of a with c, and replace it with b a.compare_exchange_weak(b,c), would this work? – Muhammad Safi Oct 26 '16 at 03:03
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/126680/discussion-between-irtaza-safi-and-peter-cordes). – Muhammad Safi Oct 26 '16 at 03:08
  • @IrtazaSafi: I added my own answer which uses Tsyvarev's idea of using the low bit of a pointer, but actually implements a usable class with most of the accessor functions you'd need. – Peter Cordes Oct 26 '16 at 04:12
  • Added usage example for Compare-and-Swap which changes both reference and mark. – Tsyvarev Oct 26 '16 at 07:25
  • The main advantage of this technique is that it's very efficient to atomically get both the pointer and the flag, and that updating both with a store can be done more efficiently than a CAS. Apparently the OP wants to CAS the whole thing for updates. Depending on the requirements for atomically getting point + flag together, a separate flag and pointer with a cmpxchg16b might be best. I still think it's better to make `val` atomic so member-functions can use atomic ops, like I showed in my answer. – Peter Cordes Oct 26 '16 at 08:13
  • @PeterCordes: *Meaning* of my class `MarkableReference` differs from yours. My class *by itself* is not prepared for atomic operations: it is just a pair of reference and mark, packed together. That is why my `val` is not *atomic*. For atomic accesses, my class is needed to be wrapped into `atomic<>`. Such a way it automatically inherits all operations with `atomic<>` with well-defined behavior. – Tsyvarev Oct 26 '16 at 08:46
  • Yes, exactly. That means you *can't* implement a `setMark` function which atomically modifies the flag without affecting the ref. (Or actually I guess you can by doing it manually with a compare_exchange_weak loop, instead of a fetch_or or fetch_and). I guess you could use `fetch_or` on an `atomic`, but only by violating encapsulation and knowing what mask to use. – Peter Cordes Oct 26 '16 at 08:52
  • Hmm, perhaps the best way to go would be to specialize the `atomic` template to add some functions, while inheriting all the existing functions from the parent `std::atomic<>` class. That would avoid writing wrappers for cmpxchg like I had to, and stuff like that. Does C++ let you do that? I'm not a template expert. Actually maybe what we want is your class as a base, and another class that inherits from `atomic`. – Peter Cordes Oct 26 '16 at 08:54
  • Interestingly, but in java [AtomicMarkableReference](https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/atomic/AtomicMarkableReference.html) allows only to **set a mark conditionally** (method `attemptMark()`). The same functionality is available for `atomic` via `compare_exchange_weak` / `compare_exchange_strong` (like in the example at the end of the answer). So I think that no specialization for `atomic` is needed. – Tsyvarev Oct 26 '16 at 09:12
  • @Tsyvarev: ah that makes sense. I hadn't thought about how you'd actually use this object, but setting the boolean without caring what the pointer was is probably not useful. – Peter Cordes Oct 27 '16 at 00:58
3

Tsyvarev's idea (of using a bit inside the pointer) is interesting, and for some use-cases, probably more efficient than storing the boolean separately. For other use cases, storing them separately and using a double-size compare-and-exchange to swap both at once will be best. That makes it more efficient to atomically modify just the boolean or just the ref. Storing them together means you always have to do an atomic read-modify-write to change one but not the other (on x86 either a lock or [mem], reg or lock cmpxchg loop if you also need the old value), instead of just an atomic store that doesn't affect the other. (Or an atomic xchg if you want the old value).

To implement with two separate members of a struct, see my answer on another question about using compare_exchange_weak on atomic<two_member_struct>. I'd suggest storing the boolean in a pointer-sized integer, to avoid any padding in the object that needs to be ignored, or that can lead to failed cmpxchg when the padding doesn't match but the data does.

If you often update your MarkableReferences with a new pointer-and-boolean together, embedding data into the pointer is probably good for performance. Otherwise it's probably bad, if you can get the compiler to make efficient code for the separate-members way.

Also, if you often need to get both the pointer and flag atomically, the embedded data way is good for that.


Tsyvarev's implementation needs to change to implement atomically modifying the boolean without setting a new pointer. class MarkableReference should itself have an atomic member variable, so it can use fetch_or and stuff like that.

This is untested, but it compiles to code that looks correct (on the Godbolt compiler explorer).

On x86-64, you could let this work even for unaligned pointers, by using the high bit of the pointer instead of the low bit. x86-64 requires that virtual addresses have their top 16 bits matching the 48th bit, i.e. that 64-bit addresses are really sign-extended 48-bit addresses. Future x86-64 CPUs could extend that in the future, though, allowing a full 64-bit virtual address space instead of the current 48-bit. Then you'd have to run programs using this code in a compatibility mode where the OS never gives them addresses that were "non-canonical" according to the old rules.

#include <atomic>
#include <assert.h>

template<class T>
class MarkableReference
{
private:
    std::atomic<uintptr_t> val;
    static const uintptr_t mask = 1;
    uintptr_t combine(T* ref, bool mark) {
        return reinterpret_cast<uintptr_t>(ref) | mark;
    }

public:
    MarkableReference(T* ref, bool mark)
        : val(combine(ref, mark))
    {
        // note that construction of an atomic is not *guaranteed* to be atomic, in case that matters.
        // On most real CPUs, storing a single aligned pointer-sized integer is atomic
        // This does mean that it's not a seq_cst operation, so it doesn't synchronize with anything
        // (and there's no MFENCE required)
        assert((reinterpret_cast<uintptr_t>(ref) & mask) == 0 && "only works with pointers that have the low bit cleared");
    }

    MarkableReference(MarkableReference &other, std::memory_order order = std::memory_order_seq_cst)
        : val(other.val.load(order))
    {}
    // IDK if relaxed is the best choice for this, or if it should exist at all
    MarkableReference &operator=(MarkableReference &other)
    {
        val.store(other.val.load(std::memory_order_relaxed), std::memory_order_relaxed);
        return *this;
    }

/////// Getters  

    T* getRef(std::memory_order order = std::memory_order_seq_cst)
    {
        return reinterpret_cast<T*>(val.load(order) & ~mask);
    }
    bool getMark(std::memory_order order = std::memory_order_seq_cst)
    {
        return (val.load(order) & mask);
    }
    T* getBoth(bool& mark, std::memory_order order = std::memory_order_seq_cst)
    {
        uintptr_t current = val.load(order);
        mark = expected & mask;
        return reinterpret_cast<T*>(expected & ~mask);
    }

/////// Setters (and exchange)

    // memory_order_acq_rel would be a good choice here
    T* xchgRef(T* ref, std::memory_order order = std::memory_order_seq_cst)
    {
        uintptr_t old = val.load(std::memory_order_relaxed);
        bool success;
        do {
            uintptr_t newval = reinterpret_cast<uintptr_t>(ref) | (old&mask);
            success = val.compare_exchange_weak(old, newval, order);
            // updates old on failure
        } while( !success );

        return reinterpret_cast<T*>(old & ~mask);
    }

    bool cmpxchgBoth_weak(T* &expectRef, bool& expectMark, T* desiredRef, bool desiredMark, std::memory_order order = std::memory_order_seq_cst)
    {
        uintptr_t desired = combine(desiredRef, desiredMark);
        uintptr_t expected = combine(expectRef, expectMark);
        bool status = compare_exchange_weak(expected, desired, order);
        expectRef = reinterpret_cast<T*>(expected & ~mask);
        expectMark = expected & mask;
        return status;
    }

    void setRef(T* ref, std::memory_order order = std::memory_order_seq_cst)
    { xchgReg(ref, order); }  // I don't see a way to avoid cmpxchg without a non-atomic read-modify-write of the boolean.
    void setRef_nonatomicBoolean(T* ref, std::memory_order order = std::memory_order_seq_cst)
    {
        uintptr_t old = val.load(std::memory_order_relaxed);  // maybe provide a way to control this order?
        // !!modifications to the boolean by other threads between here and the store will be stepped on!
        uintptr_t newval = combine(ref, old&mask);
        val.store(newval, order);
    }


    void setMark(bool mark, std::memory_order order = std::memory_order_seq_cst)
    {
        if(mark)
            val.fetch_or(mask, order);
        else
            val.fetch_and(~mask, order);
    }

    bool toggleMark(std::memory_order order = std::memory_order_seq_cst)
    {
            return mask & val.fetch_xor(mask, order);
    }

    bool xchgMark(bool mark, std::memory_order order = std::memory_order_seq_cst)
    {
        // setMark might still compile to efficient code if it just called this and let the compile optimize away the fetch part
        uintptr_t old;
        if(mark)
            old = val.fetch_or(mask, order);
        else
            old = val.fetch_and(~mask, order);
        return (old & mask);
        // It might be ideal to compile this to x86 BTS or BTR instructions (when the old value is needed)
        // but clang uses a cmpxchg loop.
    }
};

Usage examples, with asm output showing that this compiles efficiently. (See the godbolt link above)

int a;
int b;
MarkableReference<int> mr_a(&a, true);
MarkableReference<int> mr_b(&b, false);


bool getbool(MarkableReference<int> &mr) {
  return mr.getMark();  // using the default memory_order_seq_cst
}
    mov     rax, qword ptr [rdi]
    and     eax, 1
    ret

void storeToRef(MarkableReference<int> &mr, int val) {
  //(*mr.getRef(memory_order_relaxed)) = val;  // less code on weakly-ordered CPUs like MIPS
  (*mr.getRef()) = val;
}
    mov     rax, qword ptr [rdi]
    and     rax, -2
    mov     dword ptr [rax], esi
    ret

void foo() {
  mr_a.setMark(true, memory_order_relaxed);
}
    lock
    or      qword ptr [rip + mr_a], 1
    ret


void bar() {
  mr_b = mr_a;
}
    // no MFENCE since I decided to make operator= use memory_order_relaxed.  acquire / release would also not need MFENCE on x86
    mov     rax, qword ptr [rip + mr_a]
    mov     qword ptr [rip + mr_b], rax
    ret

// this one compiles to a cmpxchg loop and a branch :/
// but I think that's unavoidable
void baz() {
  bool tmp = mr_b.xchgMark(false);
  mr_a.setMark(tmp);
}

    mov     rax, qword ptr [rip + mr_b]

.LBB4_1:                                # =>This Inner Loop Header: Depth=1
    mov     rcx, rax
    and     rcx, -2
    lock cmpxchg qword ptr [rip + mr_b], rcx
    jne     .LBB4_1

    test    al, 1
    jne     .LBB4_3

    lock and     qword ptr [rip + mr_a], -2
    ret

.LBB4_3:
    lock or      qword ptr [rip + mr_a], 1
    ret
Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847