This MSDN article is just the first step of multi-thread application development: in short, it means "protect your shared variables with locks (aka critical sections), because you are not sure that the data you read/write is the same for all threads".
The CPU per-core cache is just one of the possible issues, which will lead into reading wrong values. Another issue which may lead into race condition is two threads writing to a resource at the same time: it's impossible to know which value will be stored afterward.
Since code expects the data to be coherent, some multi-thread programs may behave wrongly. With multi-threading, you are not sure that the code you write, via individual instructions, is executed as expected, when it deals with shared variables.
InterlockedExchange/InterlockedIncrement
functions are low-level asm opcodes with a LOCK prefix (or locked by design, like the XCHG EDX,[EAX]
opcode), which will indeed force the cache coherency for all CPU cores, and therefore make the asm opcode execution thread-safe.
For instance, here is how a string reference count is implemented when you assign a string value (see _LStrAsg
in System.pas - this is from our optimized version of the RTL for Delphi 7/2002 - since Delphi original code is copyrighted):
MOV ECX,[EDX-skew].StrRec.refCnt
INC ECX { thread-unsafe increment ECX = reference count }
JG @@1 { ECX=-1 -> literal string -> jump not taken }
.....
@@1: LOCK INC [EDX-skew].StrRec.refCnt { ATOMIC increment of reference count }
MOV ECX,[EAX]
...
There is a difference between the first INC ECX
and LOCK INC [EDX-skew].StrRec.refCnt
- not only the first increments ECX and not the reference count variable, but the first is not thread-safe, whereas the 2nd is prefixed by a LOCK therefore will be thread-safe.
By the way, this LOCK prefix is one of the problem of multi-thread scaling in the RTL - it's better with newer CPUs, but still not perfect.
So using critical sections is the easiest way of making a code thread-safe:
var GlobalVariable: string;
GlobalSection: TRTLCriticalSection;
procedure TThreadOne.Execute;
var LocalVariable: string;
begin
...
EnterCriticalSection(GlobalSection);
LocalVariable := GlobalVariable+'a'; { modify GlobalVariable }
GlobalVariable := LocalVariable;
LeaveCriticalSection(GlobalSection);
....
end;
procedure TThreadTwp.Execute;
var LocalVariable: string;
begin
...
EnterCriticalSection(GlobalSection);
LocalVariable := GlobalVariable; { thread-safe read GlobalVariable }
LeaveCriticalSection(GlobalSection);
....
end;
Using a local variable makes the critical section shorter, therefore your application will better scale and make use of the full power of your CPU cores. Between EnterCriticalSection
and LeaveCriticalSection
, only one thread will be running: other threads will wait in EnterCriticalSection
call... So the shorter the critical section is, the faster your application is. Some wrongly designed multi-threaded applications can actually be slower than mono-threaded apps!
And do not forget that if your code inside the critical section may raise an exception, you should always write an explicit try ... finally LeaveCriticalSection() end;
block to protect the lock release, and prevent any dead lock of your application.
Delphi is perfectly thread-safe if you protect your shared data with a lock, i.e. a Critical Section. Be aware that even reference-counted variables (like strings) should be protected, even if there is a LOCK inside their RTL functions: this LOCK is there to assume correct reference counting and avoid memory leaks, but it won't be thread-safe. To make it as fast as possible, see this SO question.
The purpose of InterlockExchange
and InterlockCompareExchange
is to change a shared pointer variable value. You can see it as a a "light" version of the critical section to access a pointer value.
In all cases, writing working multi-threaded code is not easy - it's even hard, as a Delphi expert just wrote in his blog.
You should either write simple threads with no shared data at all (make a private copy of the data before the thread starts, or use read-only shared data - which is thread-safe by essence), or call some well designed and proven libraries - like http://otl.17slon.com - which will save you a lot of debugging time.