Faster TMultiReadExclusiveWriteSynchronizer?

Question

Is there a faster kind of TMultiReadExclusiveWriteSynchronizer out there? FastCode perhaps?

Starting with Windows Vista, Microsoft added a Slim Reader/Writer lock. It performs much better than Delphi's TMultiReadExclusiveWriteSynchronizer. Unfortunately it only exists in Windows Vista and later, something which few customers actually have yet.

Presumably the concepts in use inside a Slim Reader/Writer lock could be redone in native Delphi code - but has anyone done it?

i have a situation where acquiring and releasing locks on a TMultiReadExclusiveWriteSynchronizer (even when there's no contention - a single thread), causes 100% overhead (the operation time doubles). i can run without locking, but then my class is no longer thread-safe.

Is there a faster TMultiReadExclusiveWriteSynchronizer?

Note: If i use a TCriticalSection i only suffer a 2% performance hit (although critical sections are known to be fast when the acquire succeeds, i.e. while it's single threaded and there's no contention). The downside of a CS is that i lose the "multiple readers" capability.

The Measurements

Using TMultiReadExclusiveWriteSynchronizer a sizable amount of time is spent inside BeginRead and EndRead:

enter image description here

i then ported the code to use Window's own SlimReaderWriter Lock (which some code rewrite, as it doesn't support recursive lock taking), and profiled the resutls:

TMultiReadExclusiveWriteSynchronizer: 10,698 ns per iteration
10,697,772,613 ns to iterate 1,000,000 times
SRWLock: 8,802 ns per iteration
8,801,678,339 ns to iterate 1,000,000 times
Omni Reader-Writer lock: 8,941 ns per iteration
8,940,552,487 ns to iterate 1,000,000 times

A 17% improvement when using SRWLocks (aka Omni's spinning lock).

Now, i cannot switch the code permanantly over to use Windows Vista SRWLocks, as there are some entire enterprises of customers that are still on Windows XP.

The Slim locks are just careful use of InterlockedCompareExchange functions; but more careful than i can successfully use. I'm this far away from just stealing the 140 machine instructions involved, and have it done.

Bonus Reading

@DavidHeffernan Oh i had no intention of writing something like that myself. Based on the RTL source code containing `$IFDEF` debugging lines, it would be impossible to write myself. That's why i'm looking for a 3rd party implementation. i don't mind non-native Delphi code - as long as it can be compiled into the native single Win32 executable. — Ian Boyd, Apr 30 '12 at 04:00
Why are you surprised that so few business customers have not yet upgraded to Windows 7? i have some customers that are hoping that they will soon move SQL Server databases off Windows 2000 Server. — Ian Boyd, Apr 30 '12 at 04:02
Having to ship `gdiplus.dll` with the application in case they run it on Windows 2000 is still a *joy*. — Ian Boyd, Apr 30 '12 at 04:10
@DavidHeffernan Well, not only does the code have to run on Windows XP, it has to run on Windows 2000. (Thank god we no longer have anyone using Windows NT). — Ian Boyd, Apr 30 '12 at 04:23
@DavidHeffernan Unfortunately SRWLocks are [not a drop-in replacement for `TMultiReadExclusiveWriteSynchronizer`](http://blog.delphi-jedi.net/2011/09/16/slim-readwrite-lock-the-fast-alternative-to-tmultireadexclusivewritesynchronizer/). And like i said, i'd rather avoid trying to write multi thread synchronization code myself. — Ian Boyd, Apr 30 '12 at 04:29
i'm not doing it myself. It's not easy to do correctly (Borland/Imprise/Embarcadero fiddled with their version for years). It's not easy to do fast (Imprise's is ten times slower than Microsoft's). If there is none out there, then that's the answer. There's no shame in writing that as the answer. — Ian Boyd, Apr 30 '12 at 04:33
@DavidHeffernan RTT: "Faster TMultiReadExclusiveWriteSynchronizer". It's confusing, i know. — Ian Boyd, Apr 30 '12 at 04:33
I have no idea if it is faster, but there is one in FreePascal: http://www.freepascal.org/docs-html/rtl/sysutils/tmultireadexclusivewritesynchronizer.html — Runner, Apr 30 '12 at 06:31
@Runner i looked into FreePascal's `sysuthrd.inc`. After reading a lot about SRWlocks, the reason they're so fast is that they use spinlocks (i.e. loop 4000 times, checking if it's free yet). They also multiplex reader counters, writer counts, exclusive lock all into a single 32-bit number. Making them very small, and suitable for *`InterlockedXxx`* operations. — Ian Boyd, Apr 30 '12 at 19:27
I usually solve this by designing so that locking is unnecessary. I know that sounds disingenuous but I've found that there are two dominant cases for multi-threading, which I will christen "responsive-UI" and "network service". Responsive UI is normally handled using message queueing, and properly designed service sessions don't require locking because a given session is effectively single threaded. The only case I've found in which I genuinely need locking is maintaining the collection of active sessions in a network service. — Peter Wone, May 25 '12 at 03:35
@PeterWone In this case locking is used to protect multi-threaded access to a shared resource. — Ian Boyd, May 25 '12 at 14:30
If you are locking and releasing so often that the locking is causing noticeable performance problems, then you have sacrificed too much single-threaded performance for concurrency. There are lots of ways to push the trade-off in the other direction. For example, instead of acquiring/releasing the lock on each operation, hold it across many operations. — David Schwartz, Jun 06 '12 at 10:33
@DavidSchwartz i'm not experiencing delays from contention; the act of locking itself should be faster. And algorithms have been invented in the last 10 years that can improve the speed of locks themselves. — Ian Boyd, Jun 06 '12 at 13:08
Since you aren't experiencing delays from contention, all your locking and unlocking (which is done to avoid contention) is needless. Just keep the lock held. You'll get a bit more contention (which is fine) and you'll avoid all the needless locks/unlocks that are hosing your single-thread performance. (As we say in the biz, you need to increase the scope of your locks.) — David Schwartz, Jun 06 '12 at 15:28
@DavidSchwartz i can't just keep the lock held; that would block any other threads from reading or writing. Also, that's unrelated to my question - which is about version of `TMultiReadExclusiveWriteSynchronizer` that is as fast as it can be. — Ian Boyd, Jun 26 '12 at 16:55
You aren't experiencing any delays from contention. So there is no problem with blocking other threads. You're refusing to fix the problem you *have* on the grounds that it won't fix a problem you *don't have*. You have sacrificed single-threaded performance massively to improve concurrency -- but you didn't need concurrency, you needed single-thread performance. So dial the trade-off in the other direction. — David Schwartz, Jun 27 '12 at 15:27
@DavidSchwartz What would you suggest? i cannot hold a lock for an extended period of time. — Ian Boyd, Jun 27 '12 at 16:05
@IanBoyd: I'm not suggesting you hold a lock for "an extended period of time". Just hold the lock long enough so that the cost of acquiring and releasing the lock is insignificant. For example, aim to hold the lock at least ten times as long as the time taken to acquire and release the lock. Acquiring and releasing a lock should be fast enough so that ten times that long is still not an extended period of time. There is no point in repeatedly releasing a lock just to have to immediately re-acquire it. — David Schwartz, Jun 27 '12 at 16:08
@DavidSchwartz The problem is that this is a thread-safe (i.e. COM ThreadModel=Free) object. Nobody "locks" anything; the class is thread-safe. — Ian Boyd, Jun 27 '12 at 18:00
@IanBoyd: That design results in each operation on the object acquiring and releasing locks. That's the problem. If you want to fix the problem, that's what you need to fix. That design is a bad choice because it makes huge sacrifices in single-thread speed in favor of concurrency -- but as you can see, your specific case requires the trade off to go in the other direction. (There might be very simple ways to change this, but it's hard to say without knowing more details. For example, you may be able to create higher-level operations on the object and thus need fewer of them.) — David Schwartz, Jun 27 '12 at 18:13
@DavidSchwartz The problem has been solved already; by using Interlocked operations. Starting with Windows Vista, Microsoft provides an implementation to developers called `SlimReaderWriterLock` (SRW). My question was about an implementation written in Delphi; by someone more knowledgeable than me. — Ian Boyd, Jun 27 '12 at 20:03
It's hard to recommend a particular technique without knowing why you need these locks. What's your use case? — Ross Judson, Aug 09 '12 at 13:13
@RossJudson A list that multiple threads can read and write from at the same time (think COM's "free" threading model) — Ian Boyd, Aug 11 '12 at 15:37
@IanBoyd You require retention of order -- that is, you need a list, as opposed to a bag (or set)? What operations do you require against the list? — Ross Judson, Aug 20 '12 at 16:09
@RossJudson i would hope that this fast `TMultiReadExclusiveWrite` can be used against a "List" (ordered), a "Collection" (unordered), a Hash list. And while we're at it: an XML DOM tree, a single object variable. In other words: don't confuse the application i have *today* with an application if will have *tomorrow*. In other words: don't confuse the question with the example. — Ian Boyd, Aug 22 '12 at 12:22
@IanBoyd you could try and reproduce the behaviour of SRW: you use spinlocks, and then fall back to TMultiReadExclusiveWrite after some (4k?) spins. You don't have to invent a new algorithm, you just need to port from, for example, C# to Delphi. Check out Joe Duffy's work (blog: http://www.bluebytesoftware.com, book: http://www.bluebytesoftware.com/books/winconc/winconc_book_resources.html) or The art of Multiprocessor programming. Algorithm there are tested and proven to work, just translate them to Delphi. — Lorenzo Dematté, Jan 25 '13 at 08:18
@dema80 i'm just terrified of doing it wrong. i've been programming long enough that i know that concurrency is hard; and i'm likely to have *some* subtle bug. Likely around during destruction of the `TSlimReaderWriterLock`. — Ian Boyd, Jan 25 '13 at 18:06

score 5 · Accepted Answer · edited May 23 '17 at 10:29

TOmniMREW from OmniThreadLibrary claims to be faster and more lightweight:

OTL is an excellent threading lib, BTW.

Sample Code

TOmniReaderWriterLock = class(TInterfacedObject, IReaderWriterLock)
private
   omrewReference: Integer;
public
   { IReaderWriterLock }
   procedure BeginRead;
   procedure EndRead;
   procedure BeginWrite;
   procedure EndWrite;
end;

{ TOmniReaderWriterLock }

procedure TOmniReaderWriterLock.BeginRead;
var
  currentReference: Integer;
begin
    //Wait on writer to reset write flag so Reference.Bit0 must be 0 than increase Reference
    repeat
        currentReference := Integer(omrewReference) and not 1;
    until currentReference = Integer(InterlockedCompareExchange(Pointer(omrewReference), Pointer(Integer(currentReference) + 2), Pointer(currentReference)));
end;

procedure TOmniReaderWriterLock.EndRead;
begin
    //Decrease omrewReference
    InterlockedExchangeAdd(@omrewReference, -2);
end;

procedure TOmniReaderWriterLock.BeginWrite;
var
    currentReference: integer;
begin
    //Wait on writer to reset write flag so omrewReference.Bit0 must be 0 then set omrewReference.Bit0
    repeat
        currentReference := omrewReference and (not 1);
    until currentReference = Integer(InterlockedCompareExchange(Pointer(omrewReference), Pointer(currentReference+1), Pointer(currentReference)));

    //Now wait on all readers
    repeat
    until omrewReference = 1;
end;

procedure TOmniReaderWriterLock.EndWrite;
begin
    omrewReference := 0;
end;

Ian Boyd · Answer 2 · 2016-01-15T02:07:52.670

In the end i used a compromise solution. The Omni reader-writer lock uses "slim" principles (spinning bit-manipulations). Like Window's own, it doesn't support lock escalation. I've tested it, and it doesn't seem to ~~lockup~~ crash or deadlock.

In the end i used a fallback situation. The most generic of generic interfaces to support "read-write" concepts:

IReaderWriterLock = interface
   ['{6C4150D0-7B13-446D-9D8E-866B66723320}']
   procedure BeginRead;
   procedure EndRead;
   procedure BeginWrite;
   procedure EndWrite;
end;

And then we decide at runtime which implementation to use. If we're on Windows Vista or new, use Window's own SlimReaderWriter, otherwise fallback to Omni version:

TReaderWriterLock = class(TObject)
public
   class function Create: IReaderWriterLock;
end;

class function TReaderWriterLock.Create: IReaderWriterLock;
begin
   if Win32MajorVersion >= 6 then //SRWLocks were introduced with Vista/Server 2008 (Windows version 6.0)
   begin
      //Use the Windows built-in Slim ReaderWriter lock
      Result := TSlimReaderWriterLock.Create;
   end
   else
   begin
      //XP and earlier fallback to Omni equivalent
      Result := TOmniReaderWriterLock.Create;
   end;
end;

Note: Any code is released into the public domain. No attribution required.

score 2 · Answer 3 · answered Oct 04 '12 at 20:41

The Delphi TMultiReadExclusiveWriteSynchronizer is very sophisticated - it can be acquired recursively and you can update from Read to Write.

This comes with a cost, which in this case means managing a bucket of shared state per thread. As the Windows thread-local mechanics (accessible via threadvar) is too simplistic for this (not able cope with several MREWS instances) it is done in a rather inefficient way – see the RTL or JCL sources – the implementations are quite similar, sharing bad performance and update-deadlock risk.

First ensure you really need the MREWS functionality – I assume, according to proportional size of locking overhead to the workload, you will be much better off with a TCriticalSection.

If you really-really need it, go with the Delphi implementation and watch out for the possible hidden unlock in BeginWrite – see it's documentation and return value meaning.

It is possible to implement a Vista-like SRW using the Interlocked functions or inline assembly, but it's not worth the effort in most cases.

Is it really possible to update Read state to the Write for TMREWSync? — Alex Egorov, Feb 29 '20 at 07:33

score 0 · Answer 4 · answered Jun 26 '12 at 15:59

The JCL has a MREWS that is a different implementation which might work for you. Not sure what version of windows it requires.

http://wiki.delphi-jedi.org/wiki/JCL_Help:TJclMultiReadExclusiveWrite

http://wiki.delphi-jedi.org/index.php?title=JEDI_Code_Library

score 0 · Answer 5 · answered Jan 20 '14 at 11:28

Try this? It can be used as a normal variable:

type myclass=class
              Lock:TOBRWLock;
              function ReadIt:Integer;
              procedure WriteIt(A:Integer);
             end;  
function ReadIt:Integer;
begin;
 Lock.LockRead;
 Result:=GetVal;
 Lock.UnLockRead;
end;

There is much space for improvement and you can build from here varieties that favour read above write or just act differently depending on the need.

const ldFree    = 0;
      ldReading = 1;
      ldWriting = 2;
type TOBRWLock          = record
                 [Volatile]WritersWaiting,
                           ReadersWaiting,
                           ReadersReading,
                           Disposition    : Integer;
                           procedure LockRead;
                           procedure LockWrite;
                           procedure UnlockRead;
                           procedure UnlockWrite;
                           procedure UnReadWrite;
                           procedure UnWriteRead;
                          end;

procedure TOBRWLock.LockRead;
var SpinCnt : NativeUInt;
    I       : Integer;
begin
 SpinCnt:=0;
 TInterlocked.Increment(ReadersWaiting);
 repeat
  if (Disposition=ldReading)
     then begin
           I:=TInterlocked.Increment(ReadersReading);
           if (Disposition<>ldReading) or (I=1)(*Only 1 read reference or Disposition changed, suspicious, rather retry*)
              then begin
                    TInterlocked.Decrement(ReadersReading);
                    continue;
                   end
              else begin(*Success*)
                    TInterlocked.Decrement(ReadersWaiting);
                    break;
                   end;
          end;
  if (WritersWaiting<>0)or(Disposition<>ldFree)
     then begin
           SpinBackoff(SpinCnt);
           continue;
          end;
  if TInterlocked.CompareExchange(Disposition,ldReading,ldFree)=ldFree
     then begin
           TInterlocked.Increment(ReadersReading);
           TInterlocked.Decrement(ReadersWaiting);
           break;
          end;
  SpinBackoff(SpinCnt);
 until False;
end;

procedure TOBRWLock.LockWrite;
var SpinCnt : NativeUInt;
begin
 SpinCnt:=0;
 TInterlocked.Increment(WritersWaiting);
 repeat
  if (Disposition<>ldFree)
     then begin
           SpinBackoff(SpinCnt);
           continue;
          end;
  if TInterlocked.CompareExchange(Disposition,ldWriting,ldFree)=ldFree
     then begin
           TInterlocked.Decrement(WritersWaiting);
           break;
          end
     else SpinBackoff(SpinCnt);
 until False;
end;

procedure TOBRWLock.UnlockRead;
begin
 {$IFDEF DEBUG}
 if Disposition<>ldReading
    then raise Exception.Create('UnlockRead a lock that is not Reading');
 {$ENDIF}
 TInterlocked.Decrement(ReadersReading);
 if ReadersReading=0
    then begin;
          if TInterlocked.CompareExchange(Disposition,ldFree,ldReading)<>ldReading
             then raise Exception.Create('Impossible 310');
         end;
end;

procedure TOBRWLock.UnlockWrite;
begin
 {$IFDEF DEBUG}
 if Disposition<>ldWriting
    then raise Exception.Create('UnlockWrite a lock that is not Writing');
 {$ENDIF}
 if TInterlocked.CompareExchange(Disposition,ldFree,ldWriting)<>ldWriting
    then raise Exception.Create('Impossible 321');
end;

procedure TOBRWLock.UnReadWrite;
var SpinCnt : NativeUInt;
begin
 {$IFDEF DEBUG}
 if Disposition<>ldReading
    then raise Exception.Create('UnReadWrite a lock that is not Reading');
 {$ENDIF}
 TInterlocked.Increment(WritersWaiting);
 SpinCnt:=0;
 repeat
  if ReadersReading=1(*Only me reading*)
     then begin;
           if TInterlocked.CompareExchange(Disposition,ldWriting,ldReading)<>ldReading(*Must always succeed*)
              then raise Exception.Create('Impossible 337');
           TInterlocked.Decrement(ReadersReading);
           TInterlocked.Decrement(WritersWaiting);
           break;
          end;
  SpinBackoff(SpinCnt);
 until False;
end;

Faster TMultiReadExclusiveWriteSynchronizer?

The Measurements

Bonus Reading

5 Answers5

Sample Code