4

I'm implementing a 'sequence lock' class to allow locked write and lock-free reads of a data structure.

The struct that will contain the data contains the sequence value, which will be incremented twice while the write takes place. Once before the writing starts, and once after the writing is completed. The writer is on other threads than the reader(s).

This is what the struct that holds a copy of the data, and the sequence value looks like:

template<typename T>
struct seq_data_t
{
    seq_data_t() : seq(0) {};
    int seq;                     <- should this be 'volatile int seq;'?
    T data;
};

The whole sequence lock class holds N copies of this structure in a circular buffer. Writer threads always write over the oldest copy of the data in the circular buffer, then mark it as the current copy. The writing is mutex locked.

The read function does not lock. It attempts to read the 'current' copy of the data. It stores the 'seq' value before reading. Then it reads data. Then it reads the seq value again, and compares it to the value it read the first time. If the seq value has not changed, the read is deemed to be good.

Since the writer thread could change the value of 'seq' while a read is occurring, I'm thinking that the seq variable should be marked volatile, so that the read function will explicitly read the value after it reads the data.

The read function looks like this: It will be on threads other than the writer, and perhaps several threads.

    void read(std::function<void(T*)>read_function)
    {
        for (;;)
        {
            seq_data_type<T>* d = _data.current; // get current copy
            int seq1 = d->seq;      // store starting seq no
            if (seq1 % 2)           // if odd, being modified...
                continue;           //     loop back

            read_function(&d->data);  // call the passed in read function
                                      // passing it our data.


//??????? could this read be optimized out if seq is not volatile?
            int seq2 = d->seq;      // <-- does this require that seq be volatile?
//???????

            if (seq1 == seq2)       // if still the same, good.
                return;             // if not the same, we will stay in this
                                    // loop until this condition is met.
        }
    }

Questions:

1) must seq be volatile in this context?

2) in the context of a struct with multiple members, are only the volatile qualified variable volatile, and not the other members? i.e. is only 'seq' volatile if I only mark it volatile within the struct?

ttemple
  • 852
  • 5
  • 20
  • 7
    `volatile` does absolutely nothing to make code thread safe - it's a blatant abuse of what it is supposed to be used for – UnholySheep Feb 12 '18 at 15:18
  • 2
    Unless something in the documentation for your platform says so, `volatile` is never required in multithreaded code. – David Schwartz Feb 12 '18 at 15:19
  • 2
    I'm not trying to make the code thread safe with volatile. The writing (which is not shown here) is locked. This is providing lock free reads. – ttemple Feb 12 '18 at 15:19
  • 4
    *"Since the writer thread could change the value of 'seq' while a read is occurring, I'm thinking that the `seq` variable should be marked volatile"* - this directly contradicts your last comment – UnholySheep Feb 12 '18 at 15:21
  • 1
    The issue is whether the reader, which will read the seq value twice, needs it to be marked volatile, so the second read does not get optimized out. – ttemple Feb 12 '18 at 15:21
  • 4
    @ttemple it should be `std::atomic`. – Mário Feroldi Feb 12 '18 at 15:25
  • @UnholySheep Could you elaborate? According to the documentation, `volatile` does introduce a memory barrier which seems like what OP is trying to achieve. – Rotem Feb 12 '18 at 15:31
  • does std::atomic assure that two sequential reads will not be optimized out? The issue here is not writing seq, but reading it twice in succession, while another thread could be changing it, unbeknownst to the compiler. I already have the entire write process mutex locked, including the incrementing of seq. – ttemple Feb 12 '18 at 15:33
  • @MárioFeroldi `std::atomic` would only guarantee that any single reading of `seq` does not return a corrupted value. IIRC this is already guaranteed if `sizeof(int)` is equal or smaller than 32-bits, i.e. if it is natively a single instruction. – Rotem Feb 12 '18 at 15:40
  • 1
    @Rotem the standard says nothing about small types being atomic. That's an architecture characteristic. `std::atomic` assures you that it's atomic across all implementations. – Mário Feroldi Feb 12 '18 at 15:43
  • @MárioFeroldi Fair enough. I understand now that you meant it as an additional comment and not as a solution to the question. – Rotem Feb 12 '18 at 15:45
  • And @ttemple's issue won't be solved with only making `seq` atomic. Reading from a memory while it's being written to is undefined behavior. – Mário Feroldi Feb 12 '18 at 15:45
  • 1
    Guys, since the OP is trying to implement Seqlock, i.e. some kind of "hack" to juice a bit more of performance, what makes you think it is about portability? Please do not answer "volatiles are not portable" again and again... – Andriy Berestovskyy Feb 12 '18 at 16:50
  • 2
    @Rotem: Visual Studio documents `volatile` as a memory barrier. That is correct, as long as you restrict yourself to Visual Studio. It's not at all portable. – MSalters Feb 12 '18 at 17:33

6 Answers6

5

Don't use volatile, use std::atomic<>. volatile is designed and meant to be used for interacting with memory mapped hardware, std::atomic<> is designed and meant to be used for thread synchronization. Use the right tool for the job.

Features of good std::atomic<> implementations:

  • They are lockless for standard integer types (everything up to long long, usually).

  • They work with any data type, but will use a transparent mutex for complex data types.

  • If std::atomic<> is lockless, it inserts the correct memory barriers/fences to achieve correct semantics.

  • Manipulations of std::atomic<> cannot be optimized away, they are designed for inter-thread communication after all.

cmaster - reinstate monica
  • 38,891
  • 9
  • 62
  • 106
  • since the OP is trying to implement Seqlock, i.e. some kind of "hack" to juice a bit more of performance, what makes you think it is about portability? – Andriy Berestovskyy Feb 12 '18 at 16:46
  • 2
    @AndriyBerestovskyy What makes you think a performance hack shouldn’t be portable if possible (which this one is, as far as I can see)? – Konrad Rudolph Feb 12 '18 at 17:12
  • 2
    @AndriyBerestovskyy I don't think, this is about portability. It is as much about speed as it is about correctness. If you want to implement a fast Seqlock, you need to use an atomic type, otherwise you may get nonsensical reads of the sequence number, **and atomic types are the fastest way to ensure valid reads of the sequence number**. – cmaster - reinstate monica Feb 12 '18 at 17:13
  • @cmaster - does the atomic type guarantee that the second read will not be optimized out? – ttemple Feb 12 '18 at 17:19
  • @cmaster on most platforms aligned integer reads are atomic, but `atomic<>`s are fastest and portable way to make sure, agreed. My comment was mainly due to "They work with any data type, but will use a transparent mutex for complex data types" I guess that is what OP was trying to avoid... – Andriy Berestovskyy Feb 12 '18 at 17:28
  • 1
    @ttemple yes, that is guaranteed for atomics. – Andriy Berestovskyy Feb 12 '18 at 17:28
  • @ttemple a compiler might attempt to optimize away an atomic variable that has automatic storage and is not referenced outside of its scope. Otherwise, [load and store operations should not be optimized away](https://godbolt.org/g/o1xQcY). – Mário Feroldi Feb 12 '18 at 17:29
  • 2
    @ttemple As I said, atomics were **designed specifically** for inter-thread communication. And this implies that the compiler must not a) optimize away reads/writes, or b) invent reads/writes. If it did, you would not be able to produce valid parallel code that relies on atomics. – cmaster - reinstate monica Feb 12 '18 at 17:34
3

As said Is volatile required here - You sholud not use volatile for inter-thread synchronization. Here is why (from C++ standard):

[..] volatile is a hint to the implementation to avoid aggressive optimization involving the object because the value of the object might be changed by means undetectable by an implementation.[...]

What volatile doesn't do is ensure that the sequence of the operations (and especially memory reads and writes) in one thread is visible in the same order in other threads (due to superscalar architecture of modern CPUs) . For this you need memory barriers or memory fences (different names for same thing). Here is some more reading that you may find useful:

bartop
  • 9,971
  • 1
  • 23
  • 54
  • 2
    That's not what's asked, the OP is concerned about _incorrect compiler optimizations_, not about thread race conditions, nor about instruction sequencing. This knee-jerk answer as soon as someone puts "threads" and "volatile" in the same question is getting really tiresome. – Lundin Feb 12 '18 at 16:08
  • 2
    @Lundin if OP is concerned about some compiler's behavior upon some language feature, then they shouldn't ask whether it's needed or appropriate in some specific thread code, which inevitably brings the thread-safety discussions. – Mário Feroldi Feb 12 '18 at 16:18
  • since the OP is trying to implement Seqlock, i.e. some kind of "hack" to juice a bit more of performance, what makes you think it is about portability? – Andriy Berestovskyy Feb 12 '18 at 16:49
  • @MárioFeroldi How do you implement threads without callback functions? The issue here is the use of a callback function, not the use of threads. – Lundin Feb 12 '18 at 16:52
  • I'm not using it for synchronization. I need to make sure that the second read of seq is not optimized away, because the value might have been changed by a writer. If the second read of seq shows that it's value has changed, the loop repeats and a new 'current' is returned, and the read tries again, until seq does not change during the read. The likelihood of this ever happening in my case are nil, but it is a check that hast to be done to assure that 'data' was not being written while being read. – ttemple Feb 12 '18 at 17:14
  • @ttemple you should also make sure those reads are not reordered by the compiler... – Andriy Berestovskyy Feb 12 '18 at 17:32
2

1) must seq be volatile in this context?

Sure, most probably the read from seq will be optimized out with -O3. So yes, you should hint the compiler that seq might be changed elsewhere (i.e. in other thread) with volatile keyword.

For x86 architecture it would be enough, because x86 memory model is (almost) sequential as described on Wikipedia.

For portability, you better use atomic primitives.

2) in the context of a struct with multiple members, are only the volatile qualified variable volatile, and not the other members? i.e. is only 'seq' volatile if I only mark it volatile within the struct?

No, the data should be marked as volatile as well (or you should use atomic primitives as well). Basically, the loop:

for (;;) {
    seq1 = d->seq;
    read_data(d->data);
    seq2 = d->seq;
    if (seq1 == seq2)
        return;
}

is equivalent to:

read_data(d->data);
return;

Because the only observable effect in the code is the read_data() call.

Please note, that most likely with -O3 compiler will reorder your code quite extensively. So even for x86 architecture you will need a compiler barriers between first seq read, data read and second seq read, i.e.:

for (;;)
    {
        seq_data_type<T>* d = _data.current;
        int seq1 = d->seq;
        COMPILER_BARRIER();
        if (seq1 % 2)
            continue;

        read_function(&d->data);
        COMPILER_BARRIER();
        int seq2 = d->seq;
        if (seq1 == seq2)
            return;
    }
}

The most lightweight compiler barrier is:

 #define COMPILER_BARRIER asm volatile("" ::: "memory")

For C++11 you can use atomic_signal_fence() instead.

Overall, it is safer to use std::atomic<>: it is more portable and not that tricky as juggling with volatiles and compiler barriers...

Please also have a look at Herb Sutter's presentation called "atomic<> Weapons", which explains compiler and other memory barriers as well as atomics: https://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-1-of-2

Andriy Berestovskyy
  • 8,059
  • 3
  • 17
  • 33
  • 3
    C++ has an [API for memory order](http://en.cppreference.com/w/cpp/atomic/memory_order): *std::memory_order specifies how regular, non-atomic memory accesses are to be ordered around an atomic operation*. It should be used rather than your `COMPILER_BARRIER` hack. – Mário Feroldi Feb 12 '18 at 15:48
  • @MárioFeroldi sure, that what I said in my answer at least twice -- use atomic<> I completely agree with that, but those hacks and `volatile` are needed to fix the existing code on x86 architecture... It will be not portable, but it will work. – Andriy Berestovskyy Feb 12 '18 at 15:52
  • 2
    This was the most comprehensive and accurate answer. Messing with Compiler Explorer I was able to see the effect of volatile and std::atomic, and everything you said was correct. volatile alone requires the barrier. std::atomic seems to give me everything I need without the combination of volatile and the barrier, exactly as you stated. Thanks. – ttemple Feb 12 '18 at 18:15
  • 1
    @MárioFeroldi: the 2nd load of `seq` has to wait until after the load of `T data`. `std::memory_order_acquire` can't give us that *unless we have `atomic data`* and do an acquire load on *that*. (And in theory we should be doing that to avoid data-race UB, but that would defeat the whole purpose of a seqlock for avoiding locks for types to wide for the HW to load atomically). So **we actually want a `std::atomic_thread_fence(std::memory_order_acquire)`, i.e. a load barrier.** That's still a compile-time-only barrier on x86, but actually correct on weakly-ordered ISAs. – Peter Cordes May 20 '19 at 03:47
  • Related: [Implementing 64 bit atomic counter with 32 bit atomics](//stackoverflow.com/q/54611003) is an attempt at a C++11 seqlock with commentary on what's safe in asm, and on making the data `volatile` vs. using GNU C memory barriers to get nice asm that can do wider copies. – Peter Cordes Jun 11 '19 at 23:40
1

If code is to be portable, volatile is never appropriate unless dealing with memory-mapped hardware. I repeat, never appropriate. Microsoft Visual C++, (x86 or x86/64), using the default compiler flags, adds some memory-order guarantees that are not in the standard. So using that compiler, with the non-standard behavior turned on, volatile might work for some multi-threading operations.

Use the standard multi-threading support, such as std::atomic, std::mutex, std::condition_variable, etc.

Jive Dadson
  • 16,680
  • 9
  • 52
  • 65
  • I didn't see "Windows PC" anywhere in the question. There's a whole lot of situations where `volatile` is perfectly appropriate, most notably in various embedded systems. `volatile` is of course never appropriate for the purpose of thread safety, but that's another topic. – Lundin Feb 12 '18 at 16:11
  • It's only appropriate for dealing with hardware. That's what Bjarne Stroustrup says, and he knows his stuff. If you know of a counter-example, tell us (and him). The bit about VC++ is bonus info. – Jive Dadson Feb 12 '18 at 16:14
  • Sorry but Bjarne Stroustrup doesn't even know about the format of `main()` in C++ for freestanding systems - his C++ FAQ has been incorrect in regard of freestanding systems since 1998 and still is. I kind of doubt that he even knows about the existence of such systems in the first place. – Lundin Feb 12 '18 at 16:19
  • @Lundin Are you *seriously* suggesting that Stroustrup doesn’t know about freestanding systems?! Unlikely (rather, he probably just doesn’t care about them). Anyway, Stroustrup of course doesn’t say that `volatile` is only used on VC++. – Konrad Rudolph Feb 12 '18 at 16:34
  • @JiveDadson since the OP is trying to implement Seqlock, i.e. some kind of "hack" to juice a bit more of performance, what makes you think it is about portability? – Andriy Berestovskyy Feb 12 '18 at 16:37
  • @KonradRudolph Read for yourself: http://www.stroustrup.com/bs_faq2.html#void-main. This is all of course completely incorrect, he is incorrectly citing a sub-chapter - this is a common PC programmer mistake. The truth is that C allows and has always allowed implementation-defined forms of main(). C++ is the same, though slightly more strict saying that if the function is named main(), it must have one of the specified formats. Given such embarrassing mistakes, I wouldn't exactly use Stroustrup as some sort of canon for freestanding systems. – Lundin Feb 12 '18 at 16:58
  • 1
    @Lundin You’re misrepresenting my comment. I didn’t say to use Stroustrup as canon for freestanding systems. In fact, I said that he [almost certainly] doesn’t care about them and intentionally ignores them. That doesn’t mean he doesn’t *know* about them, which I still find exceedingly unlikely. Anyway, please help me find the error: I don’t see what’s wrong with the text. `void main()` is simply invalid in C++, regardless of “freestanding” vs. “hosted”. – Konrad Rudolph Feb 12 '18 at 17:06
  • @KonradRudolph The error is that for C he is citing the sub-chapter for hosted systems, completely ignoring the parent chapter. And in C++ he ignores half of the cited chapter. Likely because those parts would prove that his arguments are incorrect. – Lundin Feb 12 '18 at 17:15
  • 1
    @Lundin No sorry you lost me. The section — and his argument — are purely about valid signatures for `main`, not about alternative (implementation-defined) startup methods. `void main()` is invalid in both hosted and freestanding C++, nothing in the rest of the chapter would prove his argument wrong. The distinction is completely irrelevant for this section (for C++; I don’t know about C). – Konrad Rudolph Feb 12 '18 at 17:16
  • @KonradRudolph The format of the function called at start-up in a freestanding system is _always_ implementation-defined in C _and_ C++. There is nothing preventing a C implementation from using the form `void main (void)`, which is good, _since this is the only format that makes sense in bare metal embedded systems or OS_. Whereas C++ explicitly does not allow the function `main` to make sense, so it cannot be used for such systems. This leaves the C++ fan boy with 3 possible options: 1) use C, 2) use an idiotic form of main that returns `int` to la-la-land, or 3) use non-standard extensions. – Lundin Feb 13 '18 at 10:40
  • @Lundin That was some admirable goal post moving you performed there. Anyway, I don’t see anything wrong with your options (1) and (3) and I’ll raise you (4): use a different name for the freestanding entry point (which is probably the alternative preferred by the authors of the C++ standard). – Konrad Rudolph Feb 13 '18 at 10:45
-1

The actual problem is that reading from some memory (data in this case) while it's being written to is described as a data race, and therefore the behavior of the program is undefined. Even if you make seq atomic, reading from data will still cause a data race. One possible correct approach is to lock on read as well.

Answering your question of whether volatile solves the read from seq being optimized out: the compiler won't remove both reads from seq, but that doesn't solve anything, because seq is still prone to data races, resulting in undefined behavior as well. That's not what volatile is meant for, so don't abuse it.

Mário Feroldi
  • 3,463
  • 2
  • 24
  • 49
  • 1
    "reading from data will still cause a data race" -- not true. The algorithm the original author is trying to implement is called Seqlock, here is the Wikipedia article: https://en.wikipedia.org/wiki/Seqlock I know, that is another "hack", but some people need those... – Andriy Berestovskyy Feb 12 '18 at 16:20
  • @AndriyBerestovskyy that's not portable, and my answer is talking from a standard level. – Mário Feroldi Feb 12 '18 at 16:22
  • 1
    Why not? Seqlocks are used in Linux kernel, so I am pretty sure it is portable to quite a lot of architectures... – Andriy Berestovskyy Feb 12 '18 at 16:26
  • @Andriy Berestovskyy - What characterizes seqlock as a hack? If it is a hack, it is a very beautiful hack when you need it. – ttemple Feb 12 '18 at 17:16
  • 1
    @ttemple that was a reference to Mario's comment on my answer... I do believe the question is about performance, not portability and I do believe Seqlocks are portable... – Andriy Berestovskyy Feb 12 '18 at 17:20
  • @Andriy.. Correct. Portability is not really an issue, but I have no non-standard code in my implementation. I attempt to write portable code, even though my code is effectively stuck on the platform we use. Performance is an issue, and lock-less reads are very beneficial in my case. They also help prevent possible gridlocks when a client runs amok. – ttemple Feb 12 '18 at 17:27
  • 1
    @ttemple sure, I do understand this, but most of the answers is down to "do not use volatiles" and that was a bit frustrating... – Andriy Berestovskyy Feb 12 '18 at 17:30
  • @Andriy - I echo your sentiment. Maybe my question was not clear. I am testing on Compiler Explorer right now. It is very interesting to see what the compiler is doing with this. – ttemple Feb 12 '18 at 17:49
-1

The answer it: it depends. Do you have a reason to suspect that your compiler is not aware that code executed from within callback functions can be executed at any time? This is usually not the case on hosted system compilers (Windows/Linux etc), but may very well be the case in embedded systems, particularly bare metal or RTOS.

This topic is kind of a beaten dead horse, for example here:

What volatile does:

  • Guarantees an up-to-date value in the variable, if the variable is modified from an external source (a hardware register, an interrupt, a different thread, a callback function etc).
  • Blocks all optimizations of read/write access to the variable.
  • Prevent dangerous optimization bugs that can happen to variables shared between several threads/interrupts/callback functions, when the compiler does not realize that the thread/interrupt/callback is called by the program. (This is particularly common among various questionable embedded system compilers, and when you get this bug it is very hard to track down.)

What volatile does not:

  • It does not guarantee atomic access or any form of thread-safety.
  • It cannot be used instead of a mutex/semaphore/guard/critical section. It cannot be used for thread synchronization.

What volatile may or may not do:

  • It may or may not be implemented by the compiler to provide a memory barrier, to protect against instruction cache/instruction pipe/instruction re-ordering issues in a multi-core environment. You should never assume that volatile does this for you, unless the compiler documentation explicitly states that it does.
Community
  • 1
  • 1
Lundin
  • 195,001
  • 40
  • 254
  • 396