11

I just read a MSDN article, "Synchronization and Multiprocessor Issues", that addresses memory cache consistency issues on multiprocessor machines. This was really eye opening to me, because I would not have thought there could be a race condition in the example they provide. This article explains that writes to memory might not actually occur (from the perspective of the other cpu) in the order written in my code. This is a new concept to me!

This article provides 2 solutions:

  1. Using the "volatile" keyword on variables that need cache consistency across multiple cpus. This is a C/C++ keyword, and not available to me in Delphi.
  2. Using InterlockExchange() and InterlockCompareExchange(). This is something I could do in Delphi if I had to. It just seems a little messy.

The article also mentions that "The following synchronization functions use the appropriate barriers to ensure memory ordering: •Functions that enter or leave critical sections".

This is the part I don't understand. Does this mean that any writes to memory that are limited to functions that use critical sections are immune from cache consistency and memory ordering issues? I have nothing against the Interlock*() functions, but another tool in my tool belt would be good to have!

Troy
  • 1,237
  • 2
  • 13
  • 27

2 Answers2

8

This MSDN article is just the first step of multi-thread application development: in short, it means "protect your shared variables with locks (aka critical sections), because you are not sure that the data you read/write is the same for all threads".

The CPU per-core cache is just one of the possible issues, which will lead into reading wrong values. Another issue which may lead into race condition is two threads writing to a resource at the same time: it's impossible to know which value will be stored afterward.

Since code expects the data to be coherent, some multi-thread programs may behave wrongly. With multi-threading, you are not sure that the code you write, via individual instructions, is executed as expected, when it deals with shared variables.

InterlockedExchange/InterlockedIncrement functions are low-level asm opcodes with a LOCK prefix (or locked by design, like the XCHG EDX,[EAX] opcode), which will indeed force the cache coherency for all CPU cores, and therefore make the asm opcode execution thread-safe.

For instance, here is how a string reference count is implemented when you assign a string value (see _LStrAsg in System.pas - this is from our optimized version of the RTL for Delphi 7/2002 - since Delphi original code is copyrighted):

            MOV     ECX,[EDX-skew].StrRec.refCnt
            INC     ECX   { thread-unsafe increment ECX = reference count }
            JG      @@1   { ECX=-1 -> literal string -> jump not taken }
            .....
       @@1: LOCK INC [EDX-skew].StrRec.refCnt { ATOMIC increment of reference count }
            MOV     ECX,[EAX]   
            ...

There is a difference between the first INC ECX and LOCK INC [EDX-skew].StrRec.refCnt - not only the first increments ECX and not the reference count variable, but the first is not thread-safe, whereas the 2nd is prefixed by a LOCK therefore will be thread-safe.

By the way, this LOCK prefix is one of the problem of multi-thread scaling in the RTL - it's better with newer CPUs, but still not perfect.

So using critical sections is the easiest way of making a code thread-safe:

var GlobalVariable: string;
    GlobalSection: TRTLCriticalSection;

procedure TThreadOne.Execute;
var LocalVariable: string;
begin
   ...
   EnterCriticalSection(GlobalSection);
   LocalVariable := GlobalVariable+'a'; { modify GlobalVariable }
   GlobalVariable := LocalVariable;
   LeaveCriticalSection(GlobalSection);
   ....
end;

procedure TThreadTwp.Execute;
var LocalVariable: string;
begin
   ...
   EnterCriticalSection(GlobalSection);
   LocalVariable := GlobalVariable; { thread-safe read GlobalVariable }
   LeaveCriticalSection(GlobalSection);
   ....
end;

Using a local variable makes the critical section shorter, therefore your application will better scale and make use of the full power of your CPU cores. Between EnterCriticalSection and LeaveCriticalSection, only one thread will be running: other threads will wait in EnterCriticalSection call... So the shorter the critical section is, the faster your application is. Some wrongly designed multi-threaded applications can actually be slower than mono-threaded apps!

And do not forget that if your code inside the critical section may raise an exception, you should always write an explicit try ... finally LeaveCriticalSection() end; block to protect the lock release, and prevent any dead lock of your application.

Delphi is perfectly thread-safe if you protect your shared data with a lock, i.e. a Critical Section. Be aware that even reference-counted variables (like strings) should be protected, even if there is a LOCK inside their RTL functions: this LOCK is there to assume correct reference counting and avoid memory leaks, but it won't be thread-safe. To make it as fast as possible, see this SO question.

The purpose of InterlockExchange and InterlockCompareExchange is to change a shared pointer variable value. You can see it as a a "light" version of the critical section to access a pointer value.

In all cases, writing working multi-threaded code is not easy - it's even hard, as a Delphi expert just wrote in his blog.

You should either write simple threads with no shared data at all (make a private copy of the data before the thread starts, or use read-only shared data - which is thread-safe by essence), or call some well designed and proven libraries - like http://otl.17slon.com - which will save you a lot of debugging time.

Community
  • 1
  • 1
Arnaud Bouchez
  • 42,305
  • 3
  • 71
  • 159
  • Thanks for the tips. Very good tip on avoiding shared memory and just making a copy to work on. Also, I can see how InterlockExchange might help performance. I just don't want to prematurely optimize. If a critical section makes my code more readable, I should start there. The example in the MSDN article of using Interlock*() works, but it's confusing. I'll go check out your links now. – Troy Aug 29 '11 at 14:44
7

First of all, according to the language standards, volatile doesn't do what the article says it does. The acquire and release semantics of volatile are MSVC specific. This can be a problem if you compile with other compilers or on other platforms. C++11 introduces language supported atomic variables which will hopefully, in due course, finally put an end to the (mis-)use of volatile as a threading construct.

Critical sections and mutexes are indeed implemented so that reads and writes of protected variables will be seen correctly from all threads.

I think the best way to think of critical sections and mutexes (locks) is as devices to bring about serialization. That is, blocks of code protected by such locks are executed serially, one after another without overlap. The serialization applies to memory access also. There can be no problems due to cache coherence or read/write reordering.

Interlocked functions are implemented using hardware based locks on the memory bus. These functions are used by lock free algorithms. What this means is that they don't use heavy weight locks like critical sections, but rather these light weight hardware locks.

Lock free algorithms can be more efficient than those based on locks, but lock free algorithms can be very much harder to write correctly. Prefer critical sections over lock free unless the performance implications are discernable.

Another article well worth reading is The "Double-Checked Locking is Broken" Declaration.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • 1
    David, there are often references to boolean,integers etc as being atomic (if properly aligned) and thus thread safe. I think the accepted answer here http://stackoverflow.com/questions/510031/list-of-delphi-data-types-with-atomic-read-write-operations puts this in perspective. quote "Reads are thread safe. Writes are not thread safe." – LU RD Aug 28 '11 at 20:54
  • @LU RD You need to be precise. What do you mean by threadsafe? And precisely what does "Reads are thread safe, writes are not thread safe" mean? – David Heffernan Aug 28 '11 at 21:48
  • it was just a general warning not to assume one can consider operations on so called atomic variables safe to use in all threading contexts. – LU RD Aug 28 '11 at 22:10
  • Thanks, David, for the info on "volatile" on non MS C, and for the info on lighter weight lock free algorithms. But, just to clarify, if I serialize all access to a variable through a critical section... when one thread on one cpu writes a variable, when the next thread on another cpu reading from a different memory cache, it will definitely read the value written from the original thread? Once the first thread writes and leaves the critical section, the all the caches for all the cpus are going to be consistent? – Troy Aug 28 '11 at 22:20
  • Yes, so long as you protect all access to the memory, both read and write, by the same lock. Otherwise, what would be the point of a lock? – David Heffernan Aug 28 '11 at 22:27
  • Yes, but the article I mentioned gives me the feeling it's not quite so simplistic. With multiple core memory caches, the cache isn't always consistent between cores for the same memory location. Special care has to be taken I know that critical sections/locks serialize access to the code. But does the operating system or even the hardware act in a special way when you're in the middle of a critical section to ensure that any writes to memory are pushed out to the caches of the other cores before the critical section is exited? – Troy Aug 28 '11 at 22:58
  • It's invariably shared memory that means you need to serialize. Hence my previous comment. – David Heffernan Aug 28 '11 at 23:01
  • Sorry, I just edited my previous question (hit the Enter key too quickly). Would you mind checking it out again. :) Thanks! – Troy Aug 28 '11 at 23:03
  • We are going round in circles. Locks protect you against reordering and coherence. The point of serialization is access to shared data. – David Heffernan Aug 28 '11 at 23:06
  • @LU RD It should be better: "Read-only access with no possible write from another thread is thread-safe, whereas reads with potential background writes are not thread-safe". – Arnaud Bouchez Aug 29 '11 at 07:31
  • @Arnaud, yes I agree totally. I was just quoting the link I referred to. – LU RD Aug 29 '11 at 08:10
  • I understand your logic completely from the perspective of doing multithreaded code on a single memory cache/cpu. In the good old days of a single cpu, there's no consistency issues between multiple caches. But are you saying that restricting your specific memory access to critical sections will "flush" any writes to RAM and thus avoid consistency issues between multiple memory caches? – Troy Aug 29 '11 at 14:32
  • @Troy Yes, that is correct. Locks protect you against reordering and coherence issues. – David Heffernan Aug 29 '11 at 14:56