1

I have a question about multithread share variable problem. the two variable is like:

{
    void* a;
    uint64_t b;
}

only one thread can modify the two variable, other thread will frequently read these two variable. I want to change a and b at one time, other thread will see the change together(see new value a and new value b). Because many thread will frequently read these two variables, so I don't want to add lock, I want to ask if there is a method to combine change a and b operation, make it like a atomic operation? like use memory fence, will it work? Thank you!

qiaochu li
  • 25
  • 4
  • Condition variables maybe. – kiner_shah Jul 14 '22 at 07:40
  • 1
    @kiner_shah: Those need a `std::mutex`, not going to help. A condition variable helps with wakeup, but there's no thread to wake here. – MSalters Jul 14 '22 at 07:41
  • @MSalters, yeah it does use `std::mutex`, had forgotten about that. – kiner_shah Jul 14 '22 at 07:42
  • You can create a pointer pointing to that data. And instead of reading the data directly, dereference the pointer each time. With that you can do atomic reads and writes without locks, by updating the pointer only. If you can't do that, then you have to use a lock. There's no other choice (or rather: every other solution will probably be equivalent to a lock). Finally: why using a reader/writer lock is an issue? – freakish Jul 14 '22 at 07:44
  • @freakish Because it is a RB, with one writer multireader. It was lockless. But now i want to add RB enlarge function, a is memory addr, b is the length mask of the RB. If i enlarge the RB, now In my opinion these two variable should change. But other reader threads might read data in RB as the same time, will read these two variable.If the reader thread read a old a and a new b, or a old b and a new a, it will cause problem. And i think every time read access add lock will affect the read performance. My english is poor, hope you could understand my describe. – qiaochu li Jul 14 '22 at 08:02
  • 1
    Some platforms allow for a 16-byte atomic to be lock-free, e.g. `struct my_pair { void *a; uint64_t *b }; std::atomic;`. Then you can read and write the pair atomically without a mutex. However, the instructions to do this may be quite expensive (e.g. a double-wide compare-and-swap), and that price may be paid by both reads and writes. It's possible that a mutex may turn out to be more efficient after all, especially if you use a shared mutex (aka read-write lock) so that the readers don't contend with each other for the lock. – Nate Eldredge Jul 14 '22 at 08:05
  • @qiaochuli you should measure first before claiming that it will visibly affect performance. Especially with reader/writer lock. This sounds like a premature optimization. – freakish Jul 14 '22 at 08:29
  • @NateEldredge: Locking an RWlock still involves all readers doing an atomic RMW on a shared location, like implementations of `atomic::load` that have to use the read side of a DWCAS like `lock cmpxchg16b`, except it doesn't necessarily succeed, and needs another write to unlock. Unless you have or some locking technique I'm not aware of? Or hardware lock elision, but Intel has disabled that in microcode on everything that used to support it, leaving only optionally RTM (which would also allow read-only readers, but all those CPUs support AVX). – Peter Cordes Jul 15 '22 at 00:22

1 Answers1

1

You're looking for a SeqLock.

It's ideal for this use-case, especially with infrequently-changed data. (e.g. like a time variable updated by a timer interrupt, read all over the place.)

SeqLock advantages include perfect read-side scaling (readers don't need to get exclusive ownership of any cache lines, they're truly read-only not just lock-free), so any number of readers can read as often as they like with zero contention with each other. The downside is occasional retry, if a reader happens to try to read at just the wrong time. That's rare, and doesn't happen when the writer hasn't just written something.

So readers aren't quite wait-free, and in fact if the writer sleeps at just the wrong time, the readers are stuck retrying until it wakes up again! So overall the algorithm isn't even lock-free or obstruction-free. But the very common fast-path is just two extra reads from the same cache line as the data, and whatever is necessary for LoadLoad ordering in the reader. If there's been no write since the last read, the loads can all be L1d cache hits.


The only thing better is if you have efficient 16-byte atomic stores and loads, like Intel (but not AMD yet) CPUs with AVX, if your compiler / libatomic uses it for 16-byte loads of std::atomic<struct_16bytes> instead of x86-64 lock cmpxchg16b. (In practice most AMD CPUs are though to have atomic 16-byte load/store as well, but only Intel has officially put it in their manuals that the AVX feature bit implies atomicity for aligned 128-bit load/store such as movaps, so compilers can safely start uting it.)

Or AArch64 guarantees 16-byte atomicity for plain stp / ldp in ARMv8.4 I think.

But without those hardware features, and compiler+options to take advantage of them, 16-byte loads often get implemented as an atomic RMW, meaning each reader takes exclusive ownership of the cache line. That means reads contend with other reads, instead of the cache line staying in shared state, hot in the cache of every core that's reading it.


like use memory fence, will it work?

No, memory fences can't create atomicity (glue multiple operations into a larger transaction), only create ordering between operations.

Although you could say that the idea behind a SeqLock is to carefully order the write and reads (wrt. to sequence variable) in order to detect torn reads and retry when it happens. So yes, barriers are important for that.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847