2

I have 2 threads, A and B. Thread A want to let B know how many records of data it has received. In modern C++, we can use Atomic or CAS or Mutex to modify the counter. How ever, neither of them is fast enough for me.

I am thing about, use a int64_t without a lock to share data counter between threads. Only thread A can modify it, other threads can only read it. I don't know whether it is safe to do so.

I know in x86-64 machines, a 64 bit int can be written with one single asm store. So I think it should be safe. And I write the following code to check it. If the method is not safe, cnt will not always increase because we will read wrong values from that 64bit memory.

#include <iostream>
#include <vector>
#include <thread>
using namespace std;

int main() {
    int64_t cnt = 0;
    vector<thread> threadVec;
    for (int i=0; i<10; i++){
        threadVec.emplace_back(move(thread([&cnt](){
            int64_t prev = 0, tmp = 0;
            while(prev != INT64_MAX){
                tmp = cnt;
                // If it is not safe, tmp will not increase all the time.
                if (tmp < prev) cout << "Error cnt declined" << endl;
                // if (tmp % 1000000 == 0) cout << tmp << endl;
                prev = tmp;
            }
        })));
    }
    this_thread::sleep_for(chrono::seconds(1));
    while(cnt < INT64_MAX) cnt++;
    for (auto& t : threadVec)
        t.join();
    return 0;
}

So is the idea correct?

phuclv
  • 37,963
  • 15
  • 156
  • 475
54138qqqq
  • 67
  • 1
  • 8
  • 4
    No. It's undefined behaviour. – Passer By Jul 26 '22 at 10:52
  • If one thread can *ever* modify an non-atomic object at the time when another thread reads that object (i.e. reading and writing are unsequenced) the behaviour is undefined. You can use `std::atomic` or you can explicitly use synchronisation primitives (e.g. a mutex that each thread will lock, so both threads wait for the lock before proceeding to access the object). Mutexes (and other synchronisation primitives) can be used to synchronise access to primitives like `int64_t` but are often considered better for larger objects (e.g. with several members to be collectively read/changed). – Peter Jul 26 '22 at 11:04
  • You have an answer already, but I'd like to add one note: You have two threads that compete for a resource, which always increases synchronization overhead. Instead, try to make the threads cooperate. – Ulrich Eckhardt Jul 26 '22 at 12:30
  • So can cooperating threads avoid competing for resources? That will be much more efficient, but can you give an example of this? I cannot think of one though, I always lock a resource or use Atomic @UlrichEckhardt – 54138qqqq Jul 27 '22 at 02:16
  • There is no general rule how to do that, it depends on the task. As an example, repeatedly checking for a job in a queue (which requires a lock) and thes sleeping in between is much better implemented using a condition variable. – Ulrich Eckhardt Jul 27 '22 at 05:05
  • @UlrichEckhardt Understood! I though about using condition variable, but that might increase latency. Actually it is a trading program, one process is retrieving data from broker, and another processed use the data to calculate trading signal. It is better to read the counter repeatedly in a loop to know data arrival immediately. – 54138qqqq Jul 27 '22 at 08:44

2 Answers2

2

No, it's undefined behavior.

Make it a std::atomic<int64_t>, instead. This is what std::atomic is for.

Sam Varshavchik
  • 114,536
  • 5
  • 94
  • 148
  • I am now using mutex to protect the counter. because thread B need to read the counter frequently (maybe in a while loop). Atomic will lock the cache line. Will atomic definitely be faster? – 54138qqqq Jul 26 '22 at 10:55
  • Atomic would be faster than a mutex, but atomic is not a replacement for a mutex. Using mutex also does several other things, such as synchronize the threads, which an atomic does not do. Thread safety is not the same thing as proper thread synchronization. Its entirely possible that whatever you're doing requires full thread synchronization, and a mutex, as using atomic will result in undefined behavior for reasons other than thread-safety related. – Sam Varshavchik Jul 26 '22 at 10:57
  • atomics are cool, but hard to use properly. IMO it is to early for him to use it (note invalid use of `move`). – Marek R Jul 26 '22 at 10:59
  • Thanks guys. Solved. I will keep it mutex. – 54138qqqq Jul 26 '22 at 11:01
  • @54138qqqq: Yes, `std::atomic` will definitely be faster as the read side is read only so it can scale perfectly. Safe if readers just need to see some recent counter value. (e.g. this is how you'd implement a timestamp that was updated by a timer interrupt.) If there's only one writer, don't use `count++`, use `count.store(count.load(relaxed) + 1, relaxed)`. (Or acquire and/or release stores if the counter indicates anything about other data being updated, like an array index.) – Peter Cordes Oct 11 '22 at 05:21
-4

From Quara:

For x64 machines with 64-bit compiler, the answer is yes (assume the int64 or uint64 is properly aligned to 8 bytes.). Plain store and load of int64 or uint64 is atomic. Yet if you are multi-threading, you may care about memory order, so you should use std::atomic.

By the way, using volatile is helpful to avoiding a thread caching it in its register, which stops other threads from reading the correct value.

For x86 machines, it is not atomic.

54138qqqq
  • 67
  • 1
  • 8
  • The original ans is here: https://www.quora.com/On-a-64-bit-machine-are-reads-and-writes-to-uint64_ts-atomic – 54138qqqq Oct 11 '22 at 02:21
  • 1
    Incorrect, counterexample for AArch64 [Which types on a 64-bit computer are naturally atomic in gnu C and gnu C++? -- meaning they have atomic reads, and atomic writes](https://stackoverflow.com/q/71866535) where GCC compiles `x = 0xdeadbeefdeadbeef` to two separate 32-bit `str` instructions (because of the pattern in the constant.) If you use `volatile` some compilers assume you want atomicity, but it's still UB in ISO C++. Use `std::atomic` with `std::memory_order_relaxed` if you want atomicity but not ordering, compiles to about the same asm as `volatile`. – Peter Cordes Oct 11 '22 at 05:18
  • Right, this answer is a common misconception. As I wrote in the answer linked by Peter, you are quite right that a plain 64-bit load or store instruction is atomic on all common 64-bit architectures, such as for instance x86-64. The gap in your logic is that the compiler doesn't have to *emit* a plain 64-bit load or store instruction. – Nate Eldredge Oct 11 '22 at 05:36