Can I replace the atomic with a volatile in an one-reader/one-writer queue?

Question

Let us consider the following one-reader/one-writer queue implemented with a linked list.

struct queue {
  queue() {
    tail = head = &reserved;
    n = 0;
  }
  void push(item *it) {
     tail->next = it;
     tail = it;
     n++;
  }
  item* pop() {
     while (n == used);
     ++used;
     head = head->next;
     return head;
  }

  item reserved;
  item *tail, *head;

  int used = 0;
  std::atomic <int> n;
}

Now I find that using volatile int n could make my writer run faster, while I'm not sure whether it guarantees that head = head->next can always read the correct value.

UPDATED: What if adding an atomic operation between tail->next, n++, i.e.,

void push(item *it) {
   tail->next = it;
   tail = it;
   a.store(0, std::memory_order_release);
   a.load(std::memory_order_acquire);
   n++;
}

in which a is never accessed by the reader? Will this guarantees the order of tail->next = it and head = head->next? (Still, it runs faster than using atomic n)

Outside of the thread coordination issue, this code appears to me to have several issues. Primarily, you can't release nodes when you are done with them, since the last expended one must be kept alive to keep the queue valid. — Avi Berger, Feb 19 '20 at 05:00
Other issues: 1) head = head->next; return head; looks like an obvious bug that in the future a programmer will "fix" without realizing that your implementation depends on an unusual extended lifetime for nodes (that will be difficult or impossible to achieve with raw pointers). 2) What happens when you hit the implementation defined behavior of used and n overflowing? 3) What happens in the (perhaps impossible) case that your writer laps your reader? — Avi Berger, Feb 19 '20 at 05:10
`volatile` is unrelated to threading, use atomic types or mutex instead. — Jarod42, Feb 19 '20 at 08:46
@Jarod42 What makes you say that volatile can't be used for threads? — curiousguy, Feb 21 '20 at 04:40
"`a.load(std::memory_order_acquire);`" What is that even supposed to mean? You are not using the result! — curiousguy, Feb 21 '20 at 04:40
@curiousguy: `volatile` is **not** for synchronization (but can be used in threading code) see for example explanation in [why-is-the-volatile-qualifier-used-through-out-stdatomic](https://stackoverflow.com/a/2479474/2684539) — Jarod42, Feb 21 '20 at 08:48

score 2 · Answer 1 · answered Feb 19 '20 at 05:25

2

The volatile keyword in C++ is not a construct to give variable read/write a guarantee to be so ordered as it looks in the code, in multi-threaded environments. So, in your code with the atomic template wrapped counter being made bare with the volatile keyword alone, increasing of the counter observed by a consumer thread does not guarantee that item::next has also been updated.

To achieve maximum performance with the guarantee, I think at least you have to insert a write barrier between updating head->next and the incremenet to the counter, e.g. by n.fetch_add(1, std::memory_order_release), and a read barrier just before fetching tail->next, like n.load(std::memory_order_acquire). I don't know CPU-arch specific details, though.

answered Feb 19 '20 at 05:25

findall

2,176
2
17
21

I see. As what you describe, image that I simply add k.store(0, std::memory_order_release) between tail->next = it and n++, where k is an atomic variable never accessed by the reader. Will this guarantees tail->next happen before n++ (and hence happen before head = head->next), without atomic load operations by the reader? – Yuxai Feb 19 '20 at 06:06
I think it leads to so-called undefined behavior. The atomic load/store functions are not ones to be replaced with magical machine codes to make the world somehow sequential in all platforms. Those rather through its use in the C++ standard defined way display requirements to compilers to output appropreate machine codes no matter which CPU-arch the target is. – findall Feb 19 '20 at 07:03

score 0 · Accepted Answer · answered Feb 19 '20 at 10:53

As already pointed out in several other comments, volatile is unrelated to multithreading, so it should not be used here. However, the reason why volatile performs better then an atmoic simply is, because with a volatile ++n translates to simple load, inc, store instructions, while with an atomic it translates to a more expensive lock xadd (assuming you compile for x86).

But since this is only a single-reader single-writer queue, you don't need expensive read-modify-write operations:

struct queue {
  queue() {
    tail = head = &reserved;
    n = 0;
  }
  void push(item *it) {
     tail->next = it;
     tail = it;
     auto new_n = n.load(std::memory_order_relaxed) + 1;
     n.store(new_n, std::memory_order_release);
  }
  item* pop() {
     while (n.load(std::memory_order_acquire) == used);
     ++used;
     head = head->next;
     return head;
  }

  item reserved;
  item *tail, *head;

  int used = 0;
  std::atomic <int> n;
}

This should perform equally good as a volatile version. If the acquire-load in pop "sees" the value written by the store-release in push, the two operations synchronize-with each other, thereby establishing the required happens-before relation.

Can I replace the atomic with a volatile in an one-reader/one-writer queue?

2 Answers2