do integer reads need to be critical section protected?

Question

I have come across C++03 some code that takes this form:

struct Foo {
    int a;
    int b;
    CRITICAL_SECTION cs;
}

// DoFoo::Foo foo_;

void DoFoo::Foolish()
{
    if( foo_.a == 4 )
    {
        PerformSomeTask();

        EnterCriticalSection(&foo_.cs);
        foo_.b = 7;
        LeaveCriticalSection(&foo_.cs);
    }
}

Does the read from foo_.a need to be protected? e.g.:

void DoFoo::Foolish()
{
    EnterCriticalSection(&foo_.cs);
    int a = foo_.a;
    LeaveCriticalSection(&foo_.cs);

    if( a == 4 )
    {
        PerformSomeTask();

        EnterCriticalSection(&foo_.cs);
        foo_.b = 7;
        LeaveCriticalSection(&foo_.cs);
    }
}

If so, why?

Please assume the integers are 32-bit aligned. The platform is ARM.

Also note that C++11 states that any race condition involving a write as Undefined Behavior. So if you're writing to `foo_.a` in a different thread, then yes, it's UB. (§1.10/4 and §1.10/21) C++03 says nothing about concurrency. — Mysticial, Nov 30 '12 at 18:20
Please see the edit. This code is limited to C++03 constructs. — PaulH, Nov 30 '12 at 18:26
Since you tagged with WinAPI, I'll point out that you can used the Interlocked... functions to do atomic integer reads and writes with slightly less overhead than a critical section. (On Windows, the features in C++11 probably use the Interlocked... functions under the hood.) — Adrian McCarthy, Nov 30 '12 at 18:27
@AdrianMcCarthy: There is no `InterlockedRead`, unfortunately. AFAIK you can use `InterlockedCompareExchange(v, 0, 0)` at a slight performance cost, or write your own `InterlockedRead` using compiler assumptions and extensions (like `_MemoryBarrier` and the fact that the contract of the function assumes the supplied integer is aligned properly, and ensure the type is such that the platforms Windows run on gives that integer atomic reads). — GManNickG, Nov 30 '12 at 18:34
@PaulH: Your approach should then be to emulate `std::atomic`'s behaviors for your use case. — GManNickG, Nov 30 '12 at 18:40
possible duplicate of [Reading interlocked variables](http://stackoverflow.com/questions/779996/reading-interlocked-variables) — Adrian McCarthy, Nov 30 '12 at 18:40

Yakk - Adam Nevraumont · Accepted Answer · 2012-11-30T20:27:39.340

11

Technically yes, but no on many platforms. First, let us assume that int is 32 bits (which is pretty common, but not nearly universal).

It is possible that the two words (16 bit parts) of a 32 bit int will be read or written to separately. On some systems, they will be read separately if the int isn't aligned properly.

Imagine a system where you can only do 32-bit aligned 32 bit reads and writes (and 16-bit aligned 16 bit reads and writes), and an int that straddles such a boundary. Initially the int is zero (ie, 0x00000000)

One thread writes 0xBAADF00D to the int, the other reads it "at the same time".

The writing thread first writes 0xBAAD to the high word of the int. The reader thread then reads the entire int (both high and low) getting 0xBAAD0000 -- which is a state that the int was never put into on purpose!

The writer thread then writes the low word 0xF00D.

As noted, on some platforms all 32 bit reads/writes are atomic, so this isn't a concern. There are other concerns, however.

Most lock/unlock code includes instructions to the compiler to prevent reordering across the lock. Without that prevention of reordering, the compiler is free to reorder things so long as it behaves "as-if" in a single threaded context it would have worked that way. So if you read a then b in code, the compiler could read b before it reads a, so long as it doesn't see an in-thread opportunity for b to be modified in that interval.

So possibly the code you are reading is using these locks to make sure that the read of the variable happens in the order written in the code.

Other issues are raised in the comments below, but I don't feel competent to address them: cache issues, and visibility.

edited Nov 30 '12 at 20:27

answered Nov 30 '12 at 18:19

Yakk - Adam Nevraumont

262,606
27
330
524

@dmajj, on many platforms, yes, but not necessarily on all platforms. – Dirk Holsopple Nov 30 '12 at 18:23
3

You forgot to mention caching effects. – Nov 30 '12 at 18:23
On all current major CPU architectures (including ARM), reads and writes of naturally aligned `int`s and pointers are atomic. Note that you still need the critical section on the write because of compiler and CPU ordering issues. – Cameron Nov 30 '12 at 18:25
2

@dmaij The problem is that it breaks down if the variable is promoted to a register. Furthermore, the relaxed memory model of modern processors can do weird things if you don't force a memory barrier via atomic or critical section. – Mysticial Nov 30 '12 at 18:29
@dmaij: Is it really up to opinion? Unless the definition of correctness in this case is "I don't need this to be accurate or well-defined" (which can be the case *sometimes* but certainly not for the overwhelming majority nor for OP), you either are factually writing working code or not. – GManNickG Nov 30 '12 at 18:35
@VladLazarenko -- I don't feel sufficiently qualified to describe how they would screw you up in particular (not well enough to include an example). – Yakk - Adam Nevraumont Nov 30 '12 at 19:05
3

Being atomic isn't sufficient. You also have to worry about visibility and compiler re-ordering. – Pete Becker Nov 30 '12 at 20:07
As @PeteBecker says, you need at least `volatile` to safely roll your own atomic reads. See [Who's afraid of a big bad optimizing compiler?](https://lwn.net/Articles/793253/) re: how the Linux kernel rolls its own atomics, using de-facto compiler support of the same kind you're depending on to do this in C++03. There are some pretty tricky gotchas, besides the well-known ones like `while(!flag){ spin; }` optimizing into `if(!flag) { while(42){} }`. ([MCU programming - C++ O2 optimization breaks while loop](https://electronics.stackexchange.com/a/387478)) – Peter Cordes Jun 14 '22 at 09:23
Depending on the surrounding code and what your compiler had for breakfast, it might make the same asm for a read of a plain `int` as it would have with `volatile`, or with `std::atomic` with `memory_order_relaxed`. So this definitely does happen to work in a lot of cases, and in 2012 it probably made sense not to use very new C++11 features. But in 2022, there's a lot less excuse for writing unsafe code. At least use `*(volatile T*)&foo` if you're stuck in a codebase without std::atomic for shared vars. – Peter Cordes Jun 14 '22 at 09:26
@PeterCordes -- I did **not** say that you need at least `volatile`. `volatile` **might** be appropriate, but only if your compiler makes that promise. – Pete Becker Jun 14 '22 at 11:48
@PeteBecker: Right sorry, as *you* said, a hardware guarantee that load/store of `int` size being atomic isn't sufficient. The part about `volatile` working on compilers which at least de-facto support that is on me. That is normally the case for GCC and clang at least. (related: [Which types on a 64-bit computer are naturally atomic in gnu C and gnu C++?](https://stackoverflow.com/a/71867102) - sometimes compilers choose to split up a 64-bit store on a RISC machine with a constant like `0xdeadbeefdeadbeef`. IIRC I found mention of GCC sometimes splitting even with volatile, but maybe a bug – Peter Cordes Jun 14 '22 at 12:47
Actually that splitting constants thing will only affect stores, and this Q&A is about loads. Err, but not necessarily Interlocked stores, it's including plain stores inside critical sections, so actually Nate's example on [Which types on a 64-bit computer are naturally atomic in gnu C and gnu C++? -- meaning they have atomic reads, and atomic writes](https://stackoverflow.com/a/71867102) is a perfect counterexample to loads being safe. The write would also have to make sure it was done atomically. With GCC/clang, making the variable itself volatile would work (affecting the store as well), – Peter Cordes Jun 14 '22 at 12:50

score 3 · Answer 2 · answered Nov 30 '12 at 18:33

Looking at this it seems that arm has quite relaxed memory model so you need a form of memory barrier to ensure that writes in one thread are visible when you'd expect them in another thread. So what you are doing, or else using std::atomic seems likely necessary on your platform. Unless you take this into account you can see updates out of order in different threads which would break your example.

score 2 · Answer 3 · answered Nov 30 '12 at 18:20

2

I think you can use C++11 to ensure that integer reads are atomic, using (for example) std::atomic<int>.

answered Nov 30 '12 at 18:20

Bartek Banachewicz

38,596
7
91
135

score 2 · Answer 4 · answered Nov 30 '12 at 19:45

The C++ standard says that there's a data race if one thread writes to a variable at the same time as another thread reads from that variable, or if two threads write to the same variable at the same time. It further says that a data race produces undefined behavior. So, formally, you must synchronize those reads and writes.

There are three separate issues when one thread reads data that was written by another thread. First, there is tearing: if writing requires more than a single bus cycle, it's possible for a thread switch to occur in the middle of the operation, and another thread could see a half-written value; there's an analogous problem if a read requires more than a single bus cycle. Second, there's visibility: each processor has its own local copy of the data that it's been working on recently, and writing to one processor's cache does not necessarily update another processor's cache. Third, there's compiler optimizations that reorder reads and writes in ways that would be okay within a single thread, but will break multi-threaded code. Thread-safe code has to deal with all three problems. That's the job of synchronization primitives: mutexes, condition variables, and atomics.

Vilx- · Answer 5 · 2012-11-30T19:33:14.320

0

Although the integer read/write operation indeed will most likely be atomic, the compiler optimizations and processor cache will still give you problems if you don't do it properly.

To explain - the compiler will normally assume that the code is single-threaded and make many optimizations that rely on that. For example, it might change the order of instructions. Or, if it sees that the variable is only written and never read, it might optimize it away entirely.

The CPU will also cache that integer, so if one thread writes it, the other one might not get to see it until a lot later.

There are two things you can do. One is to wrap in in critical section like in your original code. The other is to mark the variable as volatile. That will signal the compiler that this variable will be accessed by multiple threads and will disable a range of optimizations, as well as placing special cache-sync instructions (aka "memory barriers") around accesses to the variable (or so I understand). Apparently this is wrong.

Added: Also, as noted by another answer, Windows has Interlocked APIs that can be used to avoid these issues for non-volatile variables.

edited Nov 30 '12 at 19:33

answered Nov 30 '12 at 18:27

Vilx-

104,512
87
279
422

According to many, marking a variable as `volatile` was not really intended for variables that may be modified on another thread, though the Microsoft documentation for volatile suggests that it is, and the question is tagged WinAPI. – Adrian McCarthy Nov 30 '12 at 18:32
@AdrianMcCarthy - Well, I'm not an expert. :) I just know that this is about the only use of `volatile` that I can think of. – Vilx- Nov 30 '12 at 18:33
1

Sorry to -1, but we don't need to spread the lie that `volatile` has anything to due with multithreading any more. If it did, C++11 wouldn't have added `std::atomic`. **Cut any idea you have that volatile is helpful for multithreading.** (Disclaimer: some compilers, notably MSVC, gave now-deprecated extensions to volatile which make my previous language-based claim false; this was seen in hindsight as a mistake.) – GManNickG Nov 30 '12 at 18:37
3

More specifically, this sentence is wrong: "That will signal the compiler that this variable will be accessed by multiple threads". All it says is "reads and writes to this variable are *observable behavior* (a technical term), so make sure you do it". It says nothing about ordering, atomicity, or any other guarantees you need for correct multithreaded code. – GManNickG Nov 30 '12 at 18:39
@GManNickG - OK, thank you for correcting me. Can you point me to any articles (or explain yourself) why it is wrong, and then what is it exactly that `volatile` does? – Vilx- Nov 30 '12 at 19:33
@Vilx-: No problem. A good start is [here](http://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming/), and a PDF (so I won't link) titled "C++ and the Perils of Double-Checked Locking". – GManNickG Nov 30 '12 at 20:57

do integer reads need to be critical section protected?

5 Answers5