1
#include <iostream>

#include <atomic>
#include <memory>


template<typename T>
class LockFreeQueue {
public:
    struct CountedNode;

private:
    std::atomic<CountedNode> head;
public:
    struct Node{
        explicit Node(const T& d) : next(CountedNode()), data(std::make_shared<T>(d)), node_counter(0) { }
        std::atomic<CountedNode> next;
        std::shared_ptr<T> data;
        std::atomic<unsigned> node_counter;
    };
    struct CountedNode {
        CountedNode() noexcept : node(nullptr), counter(0) {}
        explicit  CountedNode( const T& data) noexcept : node(new Node(data) /* $4 */), counter(0)  {}
        Node* node;
        int counter;
    };


    void push( const T& data)
    {
        CountedNode new_node(data), curr, incrementedNext, next /*($2) */;
        CountedNode empty; /*($3) */
        if (head.compare_exchange_strong(empty, new_node)) std::cout << "EQUALS\n"; // $1
        else std::cout << "NOT EQUALS\n";

        if (head.compare_exchange_strong(next, new_node)) std::cout << "EQUALS\n"; // $1
        else std::cout << "NOT EQUALS\n";
    }

};


int main() {
    LockFreeQueue<int> Q;
    Q.push(2);

    return 0;
}



    int main(){
    LockFreeQueue<int> Q;
    Q.push(2);

    return 0;
    }

Ok. It compiled and executed without error. But, there is still problem, which I described below.

http://coliru.stacked-crooked.com/a/1fe71fafc5dde518

On my eye, result it is not expected: NOTEQUALS EQUALS

I have a wild problem with above piece of code.

Especially, the comparison in the line $1 makes me a problem. I mean, this comparison always returns false though it should returns true at the first time.

I was confused so I look into memory for empty and head and actually they are different. head is equal to 0x00000000 0x00000000 0x00000000 0x00000000 ( when it comes to bytes) and it seems to be OK. But empty is equal to: 0x00000000 0x00000000 0x00000000 0x7f7f7f7f7f. What is more interesting next in the $2 is equal to 0x00000000 0x00000000 0x00000000 0x00000000 so in fact, it is equals to head. But, for example, curr, incrementedNext are equal to 0x00000000 0x00000000 0x00000000 0x7f7f7f7f7f. So the behaviour of that is undeterministic so I suppose any undefined behaviour, but why? What I do not correctly, please explain me this behaviour.

P.S. I know about memory leak in the $4 but now I'm ignoring it.

I compiled it with: g++ -latomic main.cpp -std=c++14. My version of gcc is 6.1.0. I tested on gcc 5.1.0 as well. The result is same.

The link to the source created by @PeterCordes: https://godbolt.org/g/X02QV8

Gilgamesz
  • 4,727
  • 3
  • 28
  • 63
  • 1
    You [just asked this](https://stackoverflow.com/questions/38862289/the-same-instances-of-the-same-class-but-different-behaviour-probable-ub). – Kerrek SB Aug 10 '16 at 08:45
  • Yes, I deleted and created new ( correct) post. – Gilgamesz Aug 10 '16 at 08:45
  • @KerrekSB, is it a problem? After all, I deleted my previous. – Gilgamesz Aug 10 '16 at 08:46
  • 3
    Well, you could always edit the original post until it's correct. – Kerrek SB Aug 10 '16 at 08:52
  • So, you are setting head to new_node, expecting it to be empty and then setting it to new_node again, expecting it to be next? – 1stCLord Aug 10 '16 at 09:26
  • 1
    I just copy/paste your code and compile it, it works for me: [example](https://ideone.com/LjSNTA). Can I suggest you to write the compiler version and your command to compile the code? – BiagioF Aug 10 '16 at 09:42
  • @BiagioFesta, I edited. TheSombreroKid, it doesn't matter in that concrete problem. As you can see, BiagioFesta gets other results. – Gilgamesz Aug 10 '16 at 10:18
  • 1
    @Gilgamesz I just compiled (gcc 6.1) with `g++ -m32 -g -o test test.cpp -latomic` and it works correctly. Try to enable debug flag (`-g`) when you use the debugger and look at the memory. – BiagioF Aug 10 '16 at 10:24
  • 1
    @BiagioFesta Oh, it is interesting. When I add a flag `-m32` it works as I am expecting. Without that, it doesn't. What is going on? Especially, for `std::atomic::CountedNode> c;` a `std::atomic_is_lock_free(&c);` returns true. – Gilgamesz Aug 10 '16 at 10:31
  • 1
    @Gilgamesz [some related](https://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/Atomic-Builtins.html#Atomic-Builtins). What's your architecture? – BiagioF Aug 10 '16 at 10:45
  • @BiagioFesta, x86-64 Intel IvyBridge. – Gilgamesz Aug 10 '16 at 10:48
  • 1
    @Gilgamesz I think in 64bit mode the `compare_exchange_strong` is not supported. For example in my case the linker gives me an error when I try to compile without `-m32`. – BiagioF Aug 10 '16 at 10:57
  • 1
    re: deleting your old question. Yes, you absolutely should have just edited your old one. It already had two upvotes. I and most other people would have voted to reopen. Never delete/re-ask your questions, it's totally not ok. But since you've already done it, the least bad option is to keep this one. There are already comments discussing the new code here, and the comments on the old question apply to it, not this. (Do not delete this one and undelete your old one) – Peter Cordes Aug 10 '16 at 13:56
  • 2
    @BiagioFesta: Baseline x86-64 doesn't have `cmpxchg16b`, so you need to use `-mcx16` to enable it (or it's enabled as part of `-march=haswell`, or `-march=native` on anything but the oldest CPUs). If you don't do that, gcc emits a call to a library function. You can use `-latomic` to avoid the link error. – Peter Cordes Aug 10 '16 at 15:20
  • 1
    The code in the question doesn't compile, because you left out a brace and `;` after `push`. It also triggers a [warning from `-Wreorder`](http://stackoverflow.com/questions/1828037/whats-the-point-of-g-wreorder). I put a version of your code [on godbolt](https://godbolt.org/g/X02QV8) with a fix for this (initialization order in `explicit Node(const T& d)`), and proper indenting for main, and `-mcx16` so the code uses `lock cmpxchg16b`. You should update your question with that code, IMO. And the godbolt full-link. – Peter Cordes Aug 10 '16 at 15:26
  • Thanks @PeterCordes a lot, I will read the answer in a moment but firstly I would like to know: That's true that compiler generated call to function defined in stdlib instead of `cmpxchg16B` instruction. But, does it matter? I mean, what is difference between this `call` and native instruction? – Gilgamesz Aug 10 '16 at 17:31
  • 1
    @Gilgamesz: The function probably takes a global lock, so it's much slower. It's probably the same function that's used for atomic compare-exchange of objects larger than 16B. You could single-step through it. Even if the function uses `lock cmpxchg16B` because it's running on hardware that supports it, there's still the overhead of a function call. – Peter Cordes Aug 10 '16 at 17:34

1 Answers1

4

Padding. std::atomic::compare_exchange* compares memory representation of the two objects, as if by memcmp. If the structure has paddding, its contents are indeterminate, and may make two instances look distinct even if they are member-wise equal (note that CountedNode doesn't even define operator==).

In 64-bit build, there's padding after counter, and you see the issue. In 32-bit build, there isn't, and you don't.

EDIT: the part below is, I now believe, wrong; only preserved for completeness. std::atomic_init doesn't do anything to zero out padding; the example only appears to work by accident.


head (as well as Node::next) should be initialized with std::atomic_init:

std::atomic_init(&head, CountedNode());

With that in place, your example works as expected

Igor Tandetnik
  • 50,461
  • 4
  • 56
  • 85
  • It doesn't work. The problem still exists. Your link works only because of the fact, that content of padding is indeterminate and exactly in your piece of code `empty`'s padding is equal to zero. But, if you change a view of stack ( see the link what I mean, (35 line) ) a bit you will see `NOT EQUAL` again, see: http://coliru.stacked-crooked.com/a/d6a5c66734862e91 I tried it solve with "manual" writing to `empty`s memory and it helped, I mean: http://coliru.stacked-crooked.com/a/56c9c69622fb3492 – Gilgamesz Aug 10 '16 at 20:16
  • 1. In that situation, how does work std::atomic_init and is it necessary? 2. How to solve that problem in better way than manual writing to padding? – Gilgamesz Aug 10 '16 at 20:16
  • Right. In fact, I suspect that `std::atomic_init` doesn't really help any - it just `memcpy` its argument into the atomic, but nothing says the argument has its padding zeroed-out. So basically, looks like `std::atomic` where `T` is something other than an integral or pointer type is a bad idea. – Igor Tandetnik Aug 10 '16 at 20:48
  • When writing for a particular platform, you could design your structure in a way that is likely to avoid padding (e.g. use `uintptr_t` for `counter`, so it's the same size as a pointer, which should eliminate padding with most compilers). But I can't think of a way, in general, to use `std::atomic` portably. It simply wasn't designed for that. – Igor Tandetnik Aug 10 '16 at 21:01
  • This is explicitly called out in [\[atomics.types.operations.req\]/27](http://eel.is/c++draft/atomics.types.operations.req#27). – T.C. Aug 10 '16 at 21:02
  • I'm not sure that atomic non-primitive types are a bad idea. They can easily be inefficient (e.g. too big to atomically load or store in a single operation), and you have to remember that bitwise equality shouldn't be expected. I guess if you wanted to write a non-bad version of this function, you'd run into a need for something like memcpy that you can use on an atomic object, to initialize the expected value before a swap. I'm not sure what the best way do that would be. But being able to atomically update a group of 2 or 4 members seems useful. – Peter Cordes Aug 10 '16 at 21:07
  • Ok, thanks @PeterCordes and T.C.. I decided to use solution with `memcpy` to prepare whole ( members with padding) object ;). – Gilgamesz Aug 10 '16 at 21:10
  • @Gilgamesz: keep in mind that `memcpy` has `memory_order_relaxed` semantics, and *isn't* necessarily atomic. Of course, if it sees tearing in this use-case, that will just mean that the following cmpxchg fails. But make sure you don't use it for anything where those limitations matter. – Peter Cordes Aug 10 '16 at 21:13
  • @PeterCordes, I am gonna use `memcpy/memset` on **non**-shared object and store it in atomical way – Gilgamesz Aug 10 '16 at 21:16
  • @T.C. Ah, I see (took me a few passes through the text). Both strong and weak versions would work equally well when used in a loop: on the first call, if `*this` and `expected` only differ in padding, the operation would fail and `expected` would be modified to match `*this` byte for byte; and then on the second call the operation would succeed. But once you are running a loop anyway, might as well use weak version, as more efficient (at least on some architectures). – Igor Tandetnik Aug 10 '16 at 22:45