When is ReaderWriterLockSlim better than a simple lock?

Question

I'm doing a very silly benchmark on the ReaderWriterLock with this code, where reading happens 4x more often than writting:

class Program
{
    static void Main()
    {
        ISynchro[] test = { new Locked(), new RWLocked() };

        Stopwatch sw = new Stopwatch();

        foreach ( var isynchro in test )
        {
            sw.Reset();
            sw.Start();
            Thread w1 = new Thread( new ParameterizedThreadStart( WriteThread ) );
            w1.Start( isynchro );

            Thread w2 = new Thread( new ParameterizedThreadStart( WriteThread ) );
            w2.Start( isynchro );

            Thread r1 = new Thread( new ParameterizedThreadStart( ReadThread ) );
            r1.Start( isynchro );

            Thread r2 = new Thread( new ParameterizedThreadStart( ReadThread ) );
            r2.Start( isynchro );

            w1.Join();
            w2.Join();
            r1.Join();
            r2.Join();
            sw.Stop();

            Console.WriteLine( isynchro.ToString() + ": " + sw.ElapsedMilliseconds.ToString() + "ms." );
        }

        Console.WriteLine( "End" );
        Console.ReadKey( true );
    }

    static void ReadThread(Object o)
    {
        ISynchro synchro = (ISynchro)o;

        for ( int i = 0; i < 500; i++ )
        {
            Int32? value = synchro.Get( i );
            Thread.Sleep( 50 );
        }
    }

    static void WriteThread( Object o )
    {
        ISynchro synchro = (ISynchro)o;

        for ( int i = 0; i < 125; i++ )
        {
            synchro.Add( i );
            Thread.Sleep( 200 );
        }
    }

}

interface ISynchro
{
    void Add( Int32 value );
    Int32? Get( Int32 index );
}

class Locked:List<Int32>, ISynchro
{
    readonly Object locker = new object();

    #region ISynchro Members

    public new void Add( int value )
    {
        lock ( locker ) 
            base.Add( value );
    }

    public int? Get( int index )
    {
        lock ( locker )
        {
            if ( this.Count <= index )
                return null;
            return this[ index ];
        }
    }

    #endregion
    public override string ToString()
    {
        return "Locked";
    }
}

class RWLocked : List<Int32>, ISynchro
{
    ReaderWriterLockSlim locker = new ReaderWriterLockSlim();

    #region ISynchro Members

    public new void Add( int value )
    {
        try
        {
            locker.EnterWriteLock();
            base.Add( value );
        }
        finally
        {
            locker.ExitWriteLock();
        }
    }

    public int? Get( int index )
    {
        try
        {
            locker.EnterReadLock();
            if ( this.Count <= index )
                return null;
            return this[ index ];
        }
        finally
        {
            locker.ExitReadLock();
        }
    }

    #endregion

    public override string ToString()
    {
        return "RW Locked";
    }
}

But I get that both perform in more or less the same way:

Locked: 25003ms.
RW Locked: 25002ms.
End

Even making the read 20 times more often that writes, the performance is still (almost) the same.

Am I doing something wrong here?

Kind regards.

btw, if I take out the sleeps: "Locked: 89ms. RW Locked: 32ms. " - or bumping up the numbers: "Locked: 1871ms. RW Locked: 2506ms.". — Marc Gravell, Nov 18 '10 at 17:05
It's because the code you're synchronizing is too fast to create any contention. See Hans's answer, and my edit. — Dan Tao, Nov 18 '10 at 17:39

score 114 · Accepted Answer · answered Nov 18 '10 at 17:02

114

In your example, the sleeps mean that generally there is no contention. An uncontended lock is very fast. For this to matter, you would need a contended lock; if there are writes in that contention, they should be about the same (lock may even be quicker) - but if it is mostly reads (with a write contention rarely), I would expect the ReaderWriterLockSlim lock to out-perform the lock.

Personally, I prefer another strategy here, using reference-swapping - so reads can always read without ever checking / locking / etc. Writes make their change to a cloned copy, then use Interlocked.CompareExchange to swap the reference (re-applying their change if another thread mutated the reference in the interim).

answered Nov 18 '10 at 17:02

Marc Gravell

1,026,079
266
2,566
2,900

1

Copy-and-swap-on-write is definitely the way to go here. – LBushkin Nov 18 '10 at 17:20
Could you please give me more information about the reference-swapping strategy? Many thanks. – vtortola Nov 18 '10 at 17:38
25

Non-locking copy-and-swap-on-write completely avoids resource contention or locking for readers; it is fast for writers when there is no contention, but can become badly log-jammed in the presence of contention. It may be good to have writers acquire a lock before making the copy (readers wouldn't care about the lock) and release the lock after doing the swap. Otherwise if 100 threads each attempt one simultaneous update, they could in worst case have to make an average of about 50 copy-update-try-swap operations each (5,000 total copy-update-try-swap operations to perform 100 updates). – supercat Nov 18 '10 at 18:15
Marc are you able to give an example of the reference-swapping? – mcmillab Apr 29 '13 at 02:01
@mcmillab I posted one very recently actually... maybe in the last 3 weeks? I'm not well-placed to look right now, though – Marc Gravell Apr 29 '13 at 02:03
1

here's how I think it should look, please tell me if wrong: `bool bDone=false; while(!bDone) { object origObj = theObj; object newObj = origObj.DeepCopy(); // then make changes to newObj if(Interlocked.CompareExchange(ref theObj, tempObj, newObj)==origObj) bDone=true; }` – mcmillab Apr 29 '13 at 04:56
1

@mcmillab basically, yes; in some simple cases (usually when wrapping a single value, or where it doesn't matter if later writes over-write things they never saw) you can even get away with losing the `Interlocked` and just using a `volatile` field and just assigning to it - but if you need to be **sure** that your change didn't get completely lost, then `Interlocked` is ideal – Marc Gravell Apr 29 '13 at 11:24
@supercat One CAS is guaranteed to succeed every time there is contention, because someone was first. Throughput will be steady just like a lock, and the threads that didn't make it will be delayed just like a lock, and locks aren't fair either. Does it matter that much whether the other threads are spending the waiting time on a lock or doing failed CASs? – jnm2 Apr 08 '15 at 15:35
@jnm2: When a thread waits on a lock, the core that had been executing it can switch to doing other work. Further, even if there were no other useful work to be done, having threads fighting over a resource with CAS will often tie up communications bandwidth, which may degrade the performance of all threads that need to access main memory (even those not involved with the resource). – supercat Apr 08 '15 at 15:44
@supercat: Makes sense. Why are the threaded collections in .NET 4 so CAS-heavy then? I thought they were more efficient. Also CAS + lock is redundant- with writers in a lock, all the readers have to do is Volatile.Read the reference to the latest object. – jnm2 Apr 08 '15 at 16:12
4

@jnm2: The collections use CompareExchange a lot, but I don't think they do a whole lot of copying. CAS+lock isn't necessarily redundant, if one uses it to allow for the possibility that not all writers will acquire a lock. For example, a resource for which contention will generally be rare but may occasionally be severe, might have a lock and a flag indicating whether writers should acquire it. If the flag is clear, writers simply do a CAS and, if it succeeds, they go on their way. A writer that fails CAS more than twice, however, could set the flag; the flag could then remain set... – supercat Apr 08 '15 at 16:15
...as long as the contention remained. Even after the flag was set, there would be no guarantee that a CAS within the lock wouldn't lose out to a writer that had started work before the flag was set, but the number of such losses and their cumulative effect should be limited. Also, an advantage of using locks as a mechanism for alleviating contention rather than ensuring correctness is that a lock which seems to be held for an inordinate length of time may (depending upon the type of lock used) be forcibly released without sacrificing correctness. Such a policy would guard... – supercat Apr 08 '15 at 16:20
...against having a resource become permanently unusable because a consumer got waylaid while in possession of the lock. – supercat Apr 08 '15 at 16:20

score 27 · Answer 2 · answered Nov 18 '10 at 17:48

My own tests indicate that ReaderWriterLockSlim has about 5x the overhead as compared to a normal lock. That means for the RWLS to outperform a plain old lock the following conditions would generally be occurring.

The number of readers significantly outnumbers the writers.
The lock would have to be held long enough to overcome the additional overhead.

In most real applications these two conditions are not enough to overcome that additional overhead. In your code specifically, the locks are held for such a short period of time that the lock overhead will probably be the dominating factor. If you were to move those Thread.Sleep calls inside the lock then you would probably get a different result.

Not true anymore. On modern hardware even with equal readers and writers (4 readers and 4 writers) ReaderWriterLockSlim is approximately 9% faster than lock (which is a shortcut for Monitor). Benchmark it with BenchmarkDotNet and see. — tcwicks, May 08 '22 at 00:17
it is true. My CPU is in 96% percentile in the world benchmark and RW lock is 240% slower than lock — Boppity Bop, Jun 01 '22 at 12:53

score 18 · Answer 3 · answered Nov 18 '10 at 17:19

There's no contention in this program. The Get and Add methods execute in a few nanoseconds. The odds that multiple threads hit those methods at the exact time are vanishingly small.

Put a Thread.Sleep(1) call in them and remove the sleep from the threads to see the difference.

score 12 · Answer 4 · edited May 23 '17 at 11:47

Edit 2: Simply removing the Thread.Sleep calls from ReadThread and WriteThread, I saw Locked outperform RWLocked. I believe Hans hit the nail on the head here; your methods are too fast and create no contention. When I added Thread.Sleep(1) to the Get and Add methods of Locked and RWLocked (and used 4 read threads against 1 write thread), RWLocked beat the pants off of Locked.

Edit: OK, if I were actually thinking when I first posted this answer, I would've realized at least why you put the Thread.Sleep calls in there: you were trying to reproduce the scenario of reads happening more frequently than writes. This is just not the right way to do that. Instead, I would introduce extra overhead to your Add and Get methods to create a greater chance of contention (as Hans suggested), create more read threads than write threads (to ensure more frequent reads than writes), and remove the Thread.Sleep calls from ReadThread and WriteThread (which actually reduce contention, achieving the opposite of what you want).

I like what you've done so far. But here are a few issues I see right off the bat:

Why the Thread.Sleep calls? These are just inflating your execution times by a constant amount, which is going to artificially make performance results converge.
I also wouldn't include the creation of new Thread objects in the code that's measured by your Stopwatch. That is not a trivial object to create.

Whether you will see a significant difference once you address the two issues above, I don't know. But I believe they should be addressed before the discussion continues.

+1: Good point on #2--it can really skew results (same difference between each of the times, but different proportion). Though if you run this long enough, the time creating the `Thread` objects would start becoming insignificant. — Nelson Rothermel, Nov 18 '10 at 17:05

score 10 · Answer 5 · edited Oct 05 '13 at 15:06

10

You will get better performance with ReaderWriterLockSlim than a simple lock if you lock a part of code which needs longer time to execute. In this case readers can work in parallel. Acquiring a ReaderWriterLockSlim takes more time than entering a simple Monitor. Check my ReaderWriterLockTiny implementation for a readers-writer lock which is even faster than simple lock statement and offers a readers-writer functionality: http://i255.wordpress.com/2013/10/05/fast-readerwriterlock-for-net/

edited Oct 05 '13 at 15:06

AlwaysAProgrammer

2,927
2
31
40

answered Oct 05 '13 at 14:43

i255

101
1
3

I like this. The article leaves me wondering whether Monitor is faster than ReaderWriterLockTiny on each platform? – jnm2 Apr 08 '15 at 15:45
While ReaderWriterLockTiny is faster than ReaderWriterLockSlim (and also Monitor) there is a drawback : if there is many readers that take quite some time, it never gave a chance to writers (they will wait almost indefinitely).ReaderWriterLockSlim on the other hand, manage to balance access to shared ressource for readers/writers. – tigrou May 07 '18 at 11:02

score 4 · Answer 6 · answered Nov 18 '10 at 17:05

4

Uncontested locks take on the order of microseconds to acquire, so your execution time will be dwarfed by your calls to Sleep.

answered Nov 18 '10 at 17:05

MSN

53,214
7
75
105

score 3 · Answer 7 · edited Mar 23 '22 at 09:02

3

Check out this article: Link

Your sleeps are probably long enough that they make your locking/unlocking statistically insignificant.

edited Mar 23 '22 at 09:02

Glorfindel

21,988
13
81
109

answered Nov 18 '10 at 17:02

Nelson Rothermel

9,436
8
62
81

This link no longer exist - I suggest to update the answer or delete it. – Aristos Jul 21 '21 at 13:09

Steve Townsend · Answer 8 · 2010-11-18T17:14:03.223

Unless you have multicore hardware (or at least the same as your planned production environment) you won't get a realistic test here.

A more sensible test would be to extend the lifetime of your locked operations by putting a brief delay inside the lock. That way you should really be able to contrast the parallelism added using ReaderWriterLockSlim versus the serialization implied by basic lock().

Currently, the time taken by your operations that are locked are lost in the noise generated by the Sleep calls that happen outside the locks. The total time in either case is mostly Sleep-related.

Are you sure your real-world app will have equal numbers of reads and writes? ReaderWriterLockSlim is really better for the case where you have many readers and relatively infrequent writers. 1 writer thread versus 3 reader threads should demonstrate ReaderWriterLockSlim benefits better, but in any case your test should match your expected real-world access pattern.

score 2 · Answer 9 · answered Nov 18 '10 at 17:00

2

I guess this is because of the sleeps you have in you reader and writer threads.
Your read thread has a 500tims 50ms sleep which is 25000 Most of the time it is sleeping

answered Nov 18 '10 at 17:00

Itay Karo

17,924
4
40
58

score 0 · Answer 10 · answered Jul 14 '17 at 07:16

0

When is ReaderWriterLockSlim better than a simple lock?

When you have significantly more reads than writes.

answered Jul 14 '17 at 07:16

Illidan

4,047
3
39
49

score 0 · Answer 11 · answered Jul 20 '22 at 16:17

The copy and compare-swap option is only viable in the case when the contended resource is a single memory location (ex: int).

When a read operation can include a write (ex: search for an element in a a tree and insert if not found), then it is quite likely that a copy followed by a compare-swap will not be enough due to race conditions in updating multiple locations in memory - think head-tail queue).

The only viable choices (in my experience) in this situation are:

[1] Use spin locks in a (multi-core environment) when the typical transaction is 'read' and is very short (such as a tree search; binary search, ...)

[2] Use the ReaderWriterLockSlim and upgrade during the transaction if an update is needed.

When is ReaderWriterLockSlim better than a simple lock?

11 Answers11

Linked