3

I have two threads which share a circular queue. The contents of the queue are unsigned numbers (unsigned long on x86_64). One thread is the producer and the other consumer. The producer only writes to an element of the queue if the value of the element in the queue is 0 and producer always produce a non-zero value, whereas consumer only consumes it when its value is non-zero. Also consumer resets the element to 0 whenever it consumes it, so that producer get to know that consumer has consumed it.

Now what I think is that since with this scheme, there is strict access order of elements in the queue, we don't require using synchronization or atomic variables. Is my assumption correct? Or I'm a missing something here? Keep in mind that x86_64 has a relatively strict consistency memory model and only unrelated loads can be placed before a store. Also it has cache coherency which pro-actively updates the caches. Also I use volatile variables to be sure that compilers don't cache them.

pythonic
  • 20,589
  • 43
  • 136
  • 219
  • 1
    Whenever you have two things being shared, you ultimately need some form of synchronization (assuming you want consistent results). That's just how it is. – GManNickG Apr 22 '12 at 09:37
  • I dont think its required in this case. We are doing things in a strict order. – pythonic Apr 22 '12 at 09:38
  • What part of the code tells core A to flush its updated value out of the register or cache for core B to see? – GManNickG Apr 22 '12 at 09:40
  • The system is x86_64 with cache coherency and strict memory model. – pythonic Apr 22 '12 at 09:41
  • You miss the point: if you have some sort of loop like this `while (sharedVar == 0) /* wait */;`, the compiler is free to make the assumption that, `sharedVar` doesn't change (after all, nothing in your code indicates that it does), and will turn that into an infinite loop. Nothing prevents it from doing so. – GManNickG Apr 22 '12 at 09:48
  • GManNickG: No, I have declared them volatile, so the compiler does see them. – pythonic Apr 22 '12 at 09:51
  • Then my work is done for me, because [volatile is useless in multithreaded programming](http://stackoverflow.com/questions/2484980/why-is-volatile-not-considered-useful-in-multithreaded-c-or-c-programming) (this is an old and tired debate that I'd rather not get in to). – GManNickG Apr 22 '12 at 09:55
  • GManNickG: Sure volatile is useless if you are considering it as an atomic variable, but its not useless for telling the compiler not to cache a variable. In this situation like you said we don't need infinite loops like in the case you mentioned. volatile perfectly avoids that. In other words, we are not using it as an atomic variable but just to tell the compiler to not cache it in some register. – pythonic Apr 22 '12 at 09:57
  • @user1018562: `volatile` tells the compiler not to optimize out re-reads of a memory location. It doesn't ensure that the CPU invalidates any of its memory caches. – CB Bailey Apr 22 '12 at 09:59
  • Charles Bailey: Its not the job of the compiler to invalidate caches, at least for x86-64, its the job of the CPU. And remember x86 has proactive cache coherence too. So think in terms of x86 here. – pythonic Apr 22 '12 at 10:00
  • @user1018562: You didn't read my link, did you? Like I said, this is an old and tired debate. Just synchronize it and be done. Why ask a question if you're just going to argue with the answers you don't like? If you're not going to synchronize anyway, don't ask the question. You already seemed convinced of your answer. – GManNickG Apr 22 '12 at 10:01
  • GManNickG: I'm not arguing, I'm just trying to convince you that I don't use volatile as a replacement for atomic variables, but just to make sure the compiler doesn't cache some variable in a register. Is it also useless for such purpose? – pythonic Apr 22 '12 at 10:04
  • 4
    OK, so you're using a combination of `volatile` and further relying on the cache invalidation features of your particular machine architecure to _provide_ synchronization. You need synchronization and now you have it. It wouldn't be my preferred solution but if you're happy with it, go with it. – CB Bailey Apr 22 '12 at 10:09
  • "I'm not arguing, I'm just trying to convince you" ...and the definition of arguing is something other than "trying to convince or persuade"? Yes, `volatile` is useless. You don't have any ordering for your program without memory barriers, and once you add those you don't need `volatile` anymore because it's redundant. – GManNickG Apr 22 '12 at 10:12
  • 1
    @GManNickG He has memory ordering, which is provided by the underlying hardware. x86_64 has strict memory ordering even with respect to different cores/caches - without the use of memory fences. `volatile` isn't useless in _this_ case. – Gunther Piez Apr 22 '12 at 10:32
  • 1
    Rule of thumb: if you're not sure if you need synchronization (in the form of mutexes or barriers), use it. 1. Do it 2. Do it right 3. Do it fast. (If you're not sure if you need synchronization, you are in stage 2). – ninjalj Apr 22 '12 at 12:17
  • 1
    @drhirsch: there are certain cases where barriers are needed even in x86. See http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/ – ninjalj Apr 22 '12 at 12:19
  • @ninjalj Yes, and if you read the article you see that this is _not_ one of them, because the only possible race condition is when consumer and producer are at the same memory location – Gunther Piez Apr 22 '12 at 13:32
  • @user1018562: Maybe aligning the queue'entry by sizeof long is still essential, or use some simple sync primitive to read and reset the entry read(e.g. http://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/Atomic-Builtins.html), the second one would be more safe. I think that because I have two worry. – wedgwood Apr 23 '12 at 01:54
  • @user1018562: continue ... 1. could your platform treat one word atomically. 2. evan the platform could do 1, could it treat one word splited into two aligned word(i.e. not alligned) atomically? And I think when the queue full, the reader and writer may write the same entry(one read an reset, one read and write). – wedgwood Apr 23 '12 at 02:02

3 Answers3

3

Yes, you need synchronization because the producer and consumer may be trying to read to and/or write from the same location at the same time if the consumer has caught up to the producer or vice versa.

Even if your processor performs atomic operations on the data type that you are using you usually need to explicitly request atomic operations (through an appropriate API) to get the appropriate memory barriers and ensure consistency even when your threads are running on different cores.

CB Bailey
  • 755,051
  • 104
  • 632
  • 656
  • Why? Like I said we have strict order access to the queue, one thread does not read until value is non-zero, the other does not write until value in the queue is zero. So how can we have a problem here? And we have x86_64 system with automatic cache coherence and relatively strict memory model. – pythonic Apr 22 '12 at 09:43
  • @user1018562: You said you have two threads accessing the same queue. What are you doing in the case that the consumer wants to read but the queue is empty? If you want to be signalled then you'll want to use some sort of semaphore, if you're spinning on a memory location then you'll want the appropriate memory primitive to ensure the updated value is propogated to all caches as soon as it is written. – CB Bailey Apr 22 '12 at 09:48
  • Charles Bailey: I agree with what you are saying, but that was not my question. Please read it carefully. – pythonic Apr 22 '12 at 09:52
  • @user1018562: To be fair it keeps changing. I've read it again, now. – CB Bailey Apr 22 '12 at 09:54
  • 1
    Memory barriers are only required if the underlying hardware has weak memory ordering. In x86_64 this is not the case except for a few instructions which are explicitly requesting weak memory ordering (like `movntdq` etc.) – Gunther Piez Apr 22 '12 at 10:15
2

I think you don't need sync or atomic variable.

The two thread one producer and one consumer won't conflict with write the same entry.

Because the two thread can't operate on the same location if your cycle queue implemetion is proper(e.g. one read header variable, one write tag variable.). There is no need restrict the entry structure.

And there seems no need reset entry after you read it. Because you should move your read header, producer thread can know whehter one entry could be written by comparing the read header variable and write tag variable.

May that be helpful for you :)

wedgwood
  • 1,064
  • 9
  • 10
0

It seems that you don't need synchronization if the producer and consumer access only one variable. But since you said you use a circular buffer, you use an index. The index variable is vulnerable if not synchronized.

Israel Unterman
  • 13,158
  • 4
  • 28
  • 35
  • I have an index, but thread A and thread B use their own thread local indexes. I don't see a problem with that. Do you? I mean the consumer will stop anyhow when it sees a zero value and the producer will stop when it sees a non-zero value. – pythonic Apr 22 '12 at 09:47
  • @user1018562 why not just use read index and write index for the ring queue. If you use the value of entry to indict the entry could or couldn't write, these will make your program more complex, espically on the condition when ring queue is empty. – wedgwood Apr 22 '12 at 10:25
  • wedgwood: Previously I was using that technique, but it makes the program slower due to true sharing (cache coherence overhead) perhaps. In case you check for zero, there is less overhead. Only when queue is empty and when producer and consumer are very close to each other in accessing the queue, there is a bit overhead. You may have false sharing but you could avoid that by placing each element in a separate cache line. In other words we avoid the overhead of cache coherence, by making producer and consumer only depend upon the values in the queue rather than checking for read/write indexes. – pythonic Apr 22 '12 at 10:30
  • @user1018562. in that case I also don't see a problem. It's like one thread always writes, and the other one always reads, and then they excange rolls using a token. Seems ok. – Israel Unterman Apr 22 '12 at 11:10