2

I have a program which is occasionally malfunctioning and I'm wondering whether the problems might be related to different threads running on different cores handling reads and writes in a different order (I know the x86 memory model requires different cores to do things mostly as expected, but there are some cases where reads and writes couldn't be resequenced on a single CPU but might on a multi-core system). Setting processor affinity to some specifically-selected arbitrary CPU core doesn't seem like a good idea (if that core happens to be busy, there's no reason all threads shouldn't be able to migrate to some other core, provided there's a full cache flush first). Is there any way to simply direct that all threads must run on the same core, but I don't care which one it is?

PS--My understanding is that if one thread writes some data to a class instance and then does a CompareExchange on a class reference (so the reference will point to the newly-modified instance), that implies that all changes to the instance will be written out to memory before the class reference; code running on another thread on the same CPU which uses that class reference will either use the old value of the class reference or will see the changes that were made to the instance; code running on other CPU's, however, could in some tricky circumstances see the new value of the class reference but not see the new data that was written to the instance. Is my understanding in error?

supercat
  • 77,689
  • 9
  • 166
  • 211
  • (Re: PS) See http://stackoverflow.com/questions/1581718/does-interlocked-compareexchange-use-a-memory-barrier . On x86, the LOCK prefix (x86 asm) is used to resolve that situation. – meklarian Jun 28 '11 at 22:19
  • 1
    Can you show the exact code that's implementing the lock-free protocol in question? – bdonlan Jun 28 '11 at 22:26
  • Regarding your PS: This pattern is very common. If any other thread (regardless of which CPU/core it runs on) can see the new reference, it will also see the data referenced by it. I don't think that .NET's memory model is as rigorously specified as Java's, but all current implementations will behave like this. – Ringding Jun 29 '11 at 11:39
  • @Ringding: What if the reference is taken from a pool? My code does sometimes pool a couple of objects to avoid re-creating them. – supercat Jun 29 '11 at 12:55
  • 1
    Then you really need to show more of your particular implementation. But it smells like something you shouldn't do. – Ringding Jun 29 '11 at 13:43
  • @Ringding: It seems at least one of my bugs was a silly mistake on my part (I have an IEnumerable which I update using CompareExchange so any existing enumeration should not be affected by changes, but my code required two consecutive enumerations to return identical elements which of course won't happen), but since I want to design software which will work correctly even if the system behaves in the most evil way permissible under the specification, I'm still concerned about making sure that code doesn't rely upon things that will "usually" happen a certain way. – supercat Jun 29 '11 at 14:28
  • @Ringding: There seems to be some other weirdness that would cause corruption in a few characters in a long string produced via StringBuilder. Not quite sure what was going on there, since some of the glitched characters were in the middle of formatted numeric output (e.g. 1234 became 12¥4). The StringBuilder is only assembled in one thread, and no thread should be doing anything with it until the thread that's building it is done. The generated strings are written to a database, and the database is showing the corruption. I don't know... – supercat Jun 29 '11 at 14:34
  • @Ringding: ...where in the process that error occurs. I would not expect this bug to have anything to do with the aforementioned IEnumerable bug, since a faulty enumeration might cause some items to be omitted entirely, but should not cause corruption of a string written via AppendFormat. – supercat Jun 29 '11 at 14:35

1 Answers1

6

No, and this won't fix your problem either. Even on a single core, the OS can reschedule your program at any time, causing the same problems. You might be able to make the problem less likely to happen - but that just means the problem, when it inevitably appears in the field, will be that much harder to debug. Fix your lack of locking now, before it comes to bite you later.

To be more specific, there is no Windows (or Linux) function that tells the OS, "Keep all my threads on the same core". You can tell the OS to keep them all on some specific core, but you can't leave it floating like that. Since memory barriers are relatively cheap, it's best simply to implement them the right way. Even locked operations are relatively cheap on modern processors - the CPU simply obtains a lock on the cache line when it begins the read part of the operation (which it has to have on any write anyway) and refuses to release the lock until the locked operation is complete.

bdonlan
  • 224,562
  • 31
  • 268
  • 324
  • +1 I was going to write the same answer but I couldn't figure how to word it well. – Scott Chamberlain Jun 28 '11 at 21:51
  • See my addendum above. I've tried to read up on memory barriers, but am troubleshooting some code written at a time when I thought CompareExchange would imply that all other threads would see all data written before the CompareExchange. I know that threads which don't access data through a particular reference won't have any read sequencing relative to that reference, but at least in the single-CPU case I can't see any way that code could access data through a reference before reading that reference in a way that would cause trouble. – supercat Jun 28 '11 at 22:15
  • Incidentally, whether or not the Heisenbugs I'm having are caused by multi-core issues, I would think there would be another advantage to forcing code to run on one core: if a process can run on multiple cores simultaneously, things like locks and CompareExchange will require full memory barriers; if a process is limited to a single core at a time, such things may safely be eliminated assuming a full cache flush if a process gets migrated from one CPU to another. If there will be enough other processes to use up all the cores, I would think this could improve performance. – supercat Jun 29 '11 at 12:59
  • Full memory barriers are relatively cheap on x86 - cheap enough that any performance gain from threading, however small, is likely to make up for them. – bdonlan Jun 29 '11 at 15:25