2

I have the following struct

struct info {
 unsigned long a;
 unsigned long b;
};

atomic <info> data;

used by a writer thread and a reader thread. The reader has to respond to the new values as fast as possible. To do that I’ve implemented the following in the reader :

while (true) {
  auto value = data.load();
  // do some operations given these new values
}

This operation is very processor intensive. I’ve opted for this method because I believe it’s faster than, for example, using a condition variable and then waiting for the reader thread to be awakened when data changes. Also, the data updates quite frequently, hundreds of times per second. Is there a better way to do this while still having the fastest reaction time?

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
toco
  • 29
  • 1
  • I'd recommend you implement it both ways and compare speed and resource usage. – Mikael Apr 29 '19 at 19:37
  • If you're on x86, you may want a `pause` instruction (`_mm_pause()`) in the spin loop to avoid memory-order mis-speculation pipeline flushes when it does change. And also to save power. (And BTW, hundreds of times per second is only one update per millions of clock cycles on a multi-GHz CPU, that's pretty trivial. But you might want to check if `data` is actually different from the last value you saw.) – Peter Cordes Apr 29 '19 at 19:44
  • Anyway, I think it is pretty much a tradeoff between reaction time and CPU usage; I think spin-waiting is lower latency than an OS-assisted wakeup. In kernel mode on current x86, there's monitor/mwait that would let the CPU wake up on change to a certain address. (Or user-space on upcoming Intel Tremont (atom) CPUs, umonitor / umwait) But that still means the CPU can't be doing something else, it has to be asleep waiting for an event instead of busy-waiting. – Peter Cordes Apr 29 '19 at 19:46
  • Also, check that your compiler really is treating that 2-member struct as a lock-free object. If `unsigned long` is an 8-byte type, then you have a 16-byte atomic load. On x86-64, that's only possible with `lock cmpxchg16b`, and gcc7 and later don't inline that. Can you spin on only one member of the struct, with a union hack? [How can I implement ABA counter with c++11 CAS?](//stackoverflow.com/q/38984153) – Peter Cordes Apr 29 '19 at 19:48
  • A busy loop / spin loop is *almost* always the wrong solution. – Jesper Juhl Apr 29 '19 at 20:13
  • Thanks Peter. It seems that using longs was not lock free, but unsigned ints is, and ints suffice for my needs. I want to use the spin waiting for reaction time mostly, to be able to react the fastest to a change in value. I will add the _mm_pause(). – toco Apr 29 '19 at 22:27
  • Why Jesper? Your comment is not helpful as it doesn't provide a clarification or alternate solution – toco Apr 29 '19 at 22:38

1 Answers1

4

A semaphore is indeed a good option to let a writer signal new data, while a reader wakes up whenever data is ready to be consumed. However, for high performance scenarios you should consider a lock-free queue, like the one written by Moody Camel. Such a queue allows writers to add new data entries without blocking the reader(s) and the reader can get data as fast as possible, without blocking the writer(s). That way data can be processed at maximum speed, if it is available and don't consume CPU resources otherwise.

Mike Lischke
  • 48,925
  • 16
  • 119
  • 181
  • I don't really need a queue, hence the above construct. I wanted something faster than waking up the thread, that's why I was wondering about the spin-wait and the best way to implement it and potential issues. – toco Apr 29 '19 at 22:37
  • Well, I tried to address your requirement for fast handling of new data and things go fastest without any lock, regardless how fast your spin lock could ever be. – Mike Lischke Apr 30 '19 at 07:18