0

I have an std::array of 64-bit structs, where each struct looks like:

MyStruct
{
    float _a{0};
    float _b{0};
};  // Assume packed

One thread (CPU core) will write the 64-bit object and a second thread (different core) will read it.

I am using Intel x86 architecture and I know 64-bit writes are guaranteed to be atomic, from the Intel Developer Manuals.

However, I'm worried the second thread might cache the value in a register and not detect when the value has changed.

  • Will the MESIF protocol guarantee the second thread sees the writes?
  • Do I need the volatile keyword to tell the compiler another thread might be modifying the memory?
  • Do I need atomics?

The thread writing the values is extremely performance sensitive and I'd like to avoid memory barriers, mutexes etc if I can.

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
user997112
  • 29,025
  • 43
  • 182
  • 361
  • 1
    64-bit writes are atomic, but only if the value is 64-bit aligned, so you'll need to align your struct to 8 bytes instead of currently 4-byte – phuclv Feb 01 '20 at 03:26
  • an aligned 64 bit (unsigned) variable "might" get implemented as a single 64 bit register read/write which wont get interrupted, then you can easily extract the two items. you could insure this with a few lines of assembly language. Otherwise I agree with an answer below you have to use a lock. – old_timer Feb 01 '20 at 14:19
  • @old_timer An aligned 64-bit variable is guaranteed to be Atomic. I checked with Intel manuals. The question is whether/how to ensure the other thread sees the value – user997112 Feb 01 '20 at 15:43
  • volatile would be the first choice, but does that guarantee the compiler will put the value in memory as desired? You would have to either check regularly, or use asm to insure it is doing what you want. – old_timer Feb 01 '20 at 16:08

4 Answers4

1

Regardless of whether volatile will be deprecated in the next C++ version - volatile was never designed or intended to be used for multithreading! This is in contrast to Java where volatile means something entirely different (the Java volatile semantics are much closer to those of the C++ atomics).

It would be good to have some more information about the actual problem, i.e., some more context about what you are actually trying to achieve.

Based on your description you have only two threads involved - one reading and one writing - I would suggest to use a single-producer-single-consumer queue. Such a queue can be implemented with only two atomic counters for the head/tail indexes; the values itself don't have to be atomic and can be of any type (including non-trivially copyable ones).

But to understand if this would be a valid solution, I would need more information: Should the items be consumed FIFO or LIFO? What about the array? How large is it? Can it overflow/underflow (i.e., threads try to write/read entries, but the array is full/empty)? How should a full/empty array be handled?

mpoeter
  • 2,574
  • 1
  • 5
  • 12
  • An Array of 100 elements, each element is two 32-bit floats. One thread writes, the other thread reads. The writing thread needs to be as fast as is possible because it's processing a live feed, hence I wish to avoid mutexes and atomics because of the memory barriers. I like volatile because it forces the variable to cache, which can then use cache coherency. – user997112 Feb 01 '20 at 15:46
  • The array isn't a queue, elements do not move in the array. – user997112 Feb 01 '20 at 15:51
  • You are still leaving out important details. Please try to explain the _problem_ you are trying to solve in as much detail as possible - not just your attempt to solve it. Forget about volatile and cache coherency! volatile is only useful for "unusual" memory, like a memory-mapped port that can be read/written to communicate with e.g. a sensor. For more details on atomics vs volatile see this article from Herb Sutter: https://www.drdobbs.com/parallel/volatile-vs-volatile/212701484 BTW: you don't have to move the elements in the array to use it as a queue. – mpoeter Feb 01 '20 at 16:48
  • I wish to transfer 64 bits from writer thread to reader thread without making it expensive? – user997112 Feb 01 '20 at 22:26
  • Well, I can tell you that you can use simple load/store operations to transfer 64bit values between threads, but it won't help you much,will it? Whether you have to make the variable atomic or not depends on other factors... I know I am repeating myself, but please provide some more context. Use the [five whys method](https://en.wikipedia.org/wiki/Five_whys) and start with "why do you have to transfer 64 bits from writer thread to reader". If I can see the bigger picture, I can help you find a correct (under the semantics of the C++ memory model) and fast solution. – mpoeter Feb 02 '20 at 16:52
  • I think I can use atomics but with release memory barrier to achieve the same as volatile keyword. – user997112 Feb 03 '20 at 22:51
  • If anything, you need to combine acquire and release operations - release alone will do nothing. But acquire/release operations only order *surrounding instructions* (aka satellite data). However, I am not sure if this is even necessary in your case, but without more information it is impossible to tell. And please forget about volatile - it has *nothing* to do with concurrency/multithreading, and it was *never* designed or intended to be used for it! volatile allows optimizations that are forbidden for atomics, so you might not end up with what you are expecting! – mpoeter Feb 04 '20 at 08:50
  • I am literally transferring 64 bits from one thread to another. I can't elaborate as that's it. volatile would/should force the compiler to generate instructions reading the data from memory (not register). This would then allow me to use the cache coherency protocol – user997112 Feb 14 '20 at 05:12
  • @user997112 you certainly could elaborate, you just choose not to. Why do you want to transfer data between the two threads? What is the purpose behind it? How often do you do it (i.e., how often does the producer write a new value, how often does the consumer try to read a new value)? Is there some synchronization (how does the consumer know if there is a new value)? Does the consumer have to read _every_ value or only the _latest_ value? The answers to all these questions should become clear once we understand the _problem_ you are trying to solve, not just your _attempt_ to solve it. – mpoeter Feb 14 '20 at 09:26
  • @user997112 and no, volatile _will NOT_ give you the guarantees you are looking for!! You are working with C++, so you have to abide to the rules of the language. Since you want to communicate between two threads, you have to use some form of synchronization. The simple way would be via classic locks, but since you say this is performance sensitive, it would probably best to use atomics. – mpoeter Feb 14 '20 at 09:31
0

Will the MESIF protocol guarantee the second thread sees the writes?

No, this is up to the operating system, if your first (writing thread) gets prioritized and manages to write twice before the second thread can read even the first, then that's a data-race and completely operating system dependent.

Do I need the volatile keyword to tell the compiler another thread might be modifying the memory?

volatile tells the compiler not to optimize the variables away, not to not optimize it at all.

Do I need atomics?

Depends, you're not planning to use mutexes, you're not planning to use anything remotely concurrency related except for atomics, which is more meant for writing across threads.

I suggest using std::mutex in combination with std::lock_guard or std::scoped_lock.

I'm aware your title says that you don't want it but it's really the only way to guarantee the same order of reading and writing every time.

phuclv
  • 37,963
  • 15
  • 156
  • 475
JohnkaS
  • 622
  • 8
  • 18
  • I don't understand how the OS is involved with MESIF? – user997112 Jan 31 '20 at 22:33
  • The threads are pinned to separate and different CPU cores. The OS doesn't play a role in this....... – user997112 Feb 01 '20 at 00:42
  • Actually it does, it's up to the operating system where and how it distributes CPU cycles. Are you familiar with the suspension and continuation of processes? Say you have 40 processes running on your machine, on which you have 8 cores, 16 threads. It means that 16 things can happen at the same time, yet you have 40 active processes, the OS will temporarily suspend and continue processes to be able to run more than nthreads amount of processes. – JohnkaS Feb 01 '20 at 01:31
  • But the threads are pinned to separate and different CPU cores? This question is about cache coherency, Intel CPU architecture, volatile, atomics and memory barriers – user997112 Feb 01 '20 at 01:59
0

As a C++ developer you should take low level functioning of cpu with a pince of salt. See this intersting question: 'Understanding std::hardware_destructive_interference_size and std::hardware_constructive_interference_size'

True sharing appens between cache lines, from what we can see the above structure should be modified like that:

struct MyStruct
{
    alignas ( hardware_constructive_interference_size ) atomic < float > _a;
    alignas ( hardware_constructive_interference_size ) atomic < float > _b;
}; 

Accessing variables concurrently always requires the use of std::atomic. If your target appens to perform write in-order is of no importance for you in C++. Lots of things are going under the hood, and finally volatile does not work, it has been superseed by std::atomic and is deprecated.

Gold
  • 136
  • 6
  • volatile does work for the purposes it was intended to be used for; however, atomic reads and writes between threads was never one of those intended purposes. – Jeremy Friesner Feb 01 '20 at 03:43
  • So you don't know the actual meaning of volatile and you should have a look on generated assembly when the compiler tries to cache the variable on the stack. volatile keyword is DEPRECATED ( C++20 ) – Gold Feb 01 '20 at 03:51
  • Why atomics and not just volatile? Don't we just want to force the value to be loaded from memory (cachline) and not register, thereby making use of the CPU cache coherency? – user997112 Feb 01 '20 at 07:05
0
  • MESIF - has no influence on registers.
  • Volatile - was in old days used for this, but in newer compilers its only for reading from hardware registers to force a read, so it will not work.
  • atomics - as soon as you have two threads that talks together you will need atomic operations to read and write.

    std::atomic whatEver;

If you are lucky you then get

 bool is_lock_free = whatEver.is_lock_free();

and more lucky if

 bool is_lock_free = whatEver.is_always_lock_free();

you will have to do

 auto whatEver = arrayOfWhatEver[x]; // atomic
 auto a = whatEver._a;

to use the atomic operation and not just read the individual members of MyStruct.

Surt
  • 15,501
  • 3
  • 23
  • 39