multithreaded C/C++ variable no cache (Linux)

Question

I use 2 pthreads, where one thread "notifies" the other one of an event, and for that there is a variable ( normal integer ), which is set by the second thread.

This works, but my question is, is it possible that the update is not seen immediately by the first (reading) thread, meaning the cache is not updated directly? And if so, is there a way to prevent this behaviour, e.g. like the volatile keyword in java?

(the frequency which the event occurs is approximately in microsecond range, so more or less immediate update needs to be enforced).

/edit: 2nd question: is it possible to enforce that the variable is hold in the cache of the core where thread 1 is, since this one is reading it all the time. ?

C has the `volatile` keyword too, which tells the compiler that the variable might change "by itself" (through the control of something other than the current thread) so that it doesn't perform some optimizations that assume the variable won't change. — cdhowie, Aug 23 '11 at 18:38
Yes, but `volatile` isn't intended to be used as a threading construct in C. — dwerner, Aug 23 '11 at 18:38
No, it's not, but if you're using some global variable as a signalling mechanism, it can ensure that the compiler doesn't optimize away your tests. Things are frequently useful outside of the parameters of their original intentions. — cdhowie, Aug 23 '11 at 18:39
Compiler optimization does not seem to be the problem - the thread notices the update "eventually". What I am concerned, is that the cache of the Core is not updated fast enough ( = instantly). — David, Aug 23 '11 at 19:08
See also: http://stackoverflow.com/questions/6639825/how-can-i-convert-non-atomic-operation-to-atomic/6640437#6640437 — ninjalj, Aug 23 '11 at 19:46

score 2 · Answer 1 · answered Aug 23 '11 at 18:40

2

It sounds to me as though you should be using a pthread condition variable as your signaling mechanism. This takes care of all the issues you describe.

answered Aug 23 '11 at 18:40

David Heffernan

601,492
42
1,072
1,490

The signalling mechanism I did first, but the performance was too bad, therefore I'd like to avoid locking if possible. – David Aug 23 '11 at 18:57
1

That doesn't make any sense to me – David Heffernan Aug 23 '11 at 18:59
Since the only information I need to transfer is an int, and int writes can be performed atomic, I guess it's not necessary to lock for that. – David Aug 23 '11 at 19:11
Is the listener running a busy loop? And when you signal by writing to this variable, how do you know that the writing thread won't write another value before the first one was read? What are you actually doing? What is your underlying problem and solution design? – David Heffernan Aug 23 '11 at 19:19
It is a network protocol, where there are two threads, one who deals with incoming packets, and one who deals with outgoing packets. The incoming one always updates the highest ACK'ed packet, so that the outgoing knows what it needs to do. Since the outgoing thread needs to send data at a high rate, it's "nearly" a busy loop, which in every iteration checks what the highest ACK'ed packet is. (obviously I can reduce the number of reads, but the problem is that even if I read it every time, the thread does not get the current value for too long) – David Aug 23 '11 at 19:22
(I considered merging the two threads to one already, but I thought it'd be nicer if there is a fix for that problem without merging everything to single-threaded) – David Aug 23 '11 at 19:25
Do you have any equivalent of Windows Interlocked*** functions, implemented by x86 LOCK instruction? – David Heffernan Aug 23 '11 at 19:32
And you don't care about any synchronisation between the threads. The races in your design are benign? – David Heffernan Aug 23 '11 at 19:35
Correctness is no problem. A just writes, and B just reads. The only problem is, that the cache is not flushed as fast as it is needed to be from the core where A runs to the core where B runs. – David Aug 23 '11 at 19:40
And you really don't care if B misses some of A's writes. – David Heffernan Aug 23 '11 at 19:43
That is even a good thing, because the writes of A are always updates, so old ones are always out of date, and only the newest one is "correct". From a logical point of view, they should simply look at the very same variable. – David Aug 23 '11 at 19:48

score 2 · Answer 2 · answered Aug 23 '11 at 19:33

It may not be immediately visible by the other processors but not because of cache coherence. The biggest problems of visibility will be due to your processor's out-of-order execution schemes or due to your compiler re-ordering instructions while optimizing.

In order to avoid both these problems, you have to use memory barriers. I believe that most pthread primitives are natural memory barriers which means that you shouldn't expect loads or stores to be moved beyond the boundaries formed by the lock and unlock calls. The volatile keyword can also be useful to disable a certain class of compiler optimizations that can be useful when doing lock-free algorithms but it's not a substitute for memory barriers.

That being said, I recommend you don't do this manually and there are quite a few pitfalls associated with lock-free algorithms. Leaving these headaches to library writters should make you a happier camper (unless you're like me and you love headaches :) ). So my final recomendation is to ignore everything I said and use what vromanov or David Heffman suggested.

score 2 · Answer 3 · answered Aug 23 '11 at 22:41

The most appropriate way to pass a signal from one thread to another should be to use the runtime library's signalling mechanisms, such as mutexes, condition variables, semaphores, and so forth.

If these have too high an overhead, my first thought would be that there was something wrong with the structure of the program. If it turned out that this really was the bottleneck, and restructuring the program was inappropriate, then I would use atomic operations provided by the compiler or a suitable library.

Using plain int variables, or even volatile-qualified ones is error prone, unless the compiler guarantees they have the appropriate semantics. e.g. MSVC makes particular guarantees about the atomicity and ordering constraints of plain loads and stores to volatile variables, but gcc does not.

score 1 · Answer 4 · answered Aug 23 '11 at 19:11

1

Better way to use atomic variables. For sample you can use libatomic. volatile keyword not enough.

answered Aug 23 '11 at 19:11

vromanov

881
6
11

Yes, but with normal loads/stores instead of atomic instructions, the variable might not get immediately flushed from the store buffer. An alternative to libatomic is http://gcc.gnu.org/onlinedocs/gcc/Atomic-Builtins.html , FWIW. – janneb Aug 23 '11 at 19:24
Updates are atomic, but code like this "if(variable!=0) { variable=0; do_something();}" isn't safe. – vromanov Aug 23 '11 at 19:30
Actually janneb got what I mean, correctness is _not_ an issue, only caching is the problem. – David Aug 23 '11 at 19:32
Looking at [link](http://golubenco.org/?p=7) , for set and read, theonly operations I need, the keyword volatile is sufficient? – David Aug 23 '11 at 19:53
Just for sample. Imagine next code: 1-st thread set flag. 2-nd tread see flag and clear it. 1-st tread set flag again. .... This is good case. Bad case: 1-st tread set flag. 2-nd tread see flag. 1-st tread set flag again. 2-tread clear flag. In this case you skip second flag If this isn't issue you can use volatile. – vromanov Aug 23 '11 at 20:09
yeah that's not an issue. But I figured out that maybe it's not that what's the problem, but maybe the scheduling of the OS is what makes it problematic in my case. I totally forgot that that could also be the case. But thanks for your information – David Aug 23 '11 at 20:35

multithreaded C/C++ variable no cache (Linux)

4 Answers4