10

I have a static class which is accessed by multiple remoting and other internal to the application threads. Part of the functionality of this class is controlling read/write access to various files, so I've implemented a static ReaderWriterLock on the list of files. The project uses the .net framework 2.0 as part of the customer requirements.

However when I stress test the system using a number of different clients (generally I'm using 16) each performing a large amount of reads and writes then very intermittently and only after several hours or even days have passed with at least 500k+ transactions completed the system crashes. Ok so we got a bug..

But when I check the logs of all locking events I can see that the following has happened:

1: Thread A acquires a write lock directly, checking IsWriterLock shows it to be true.

2: Thread B tries to acquire a reader lock and succeeds even though Thread A still has the write lock

3: System now crashes, stack trace now shows a null reference exception to the readerwriterlock

This process has been run several hundred thousand times previously with no errors and I can check the logs and see that the read lock was blocked in all cases previously until the write had exited. I have also tried implementing the readerwriterlock as a singleton but the issue still occurs

Has anybody ever seen anything like this before ??

A slimed down version of the readerwriterlock implementation used is shown below:

private const int readwriterlocktimeoutms = 5000;
    private static ReaderWriterLock readerWriterLock = new ReaderWriterLock();


    // this method will be called by thread A
    public static void MethodA()
    {
        // bool to indicate that we have the lock
        bool IsTaken = false;

        try
        {                
            // get the lock
            readerWriterLock.AcquireWriterLock(readwriterlocktimeoutms);

            // log that we have the lock for debug
            // Logger.LogInfo("MethodA: acquired write lock; writer lock held {0}; reader lock held {1}", readerWriterLock.IsWriterLockHeld.ToString(),readerWriterLock.IsReaderLockHeld.ToString(), );

            // mark that we have taken the lock
            IsTaken = true;
        }
        catch(Exception e)
        {
            throw new Exception(string.Format("Error getting lock {0} {1}", e.Message, Environment.StackTrace));
        }

        try
        {
           // do some work
        }
        finally
        {
            if (IsTaken)
            {
                readerWriterLock.ReleaseWriterLock();
            }
        }

    }

    // this method will be called by thread B
    public static void MethodB()
    {
        // bool to indicate that we have the lock
        bool IsTaken = false;

        try
        {
            // get the lock
            readerWriterLock.AcquireReaderLock(readwriterlocktimeoutms);

            // log that we have the lock for debug
            // Logger.LogInfo("MethodB: acquired read lock; writer lock held {0}; reader lock held {1}", readerWriterLock.IsWriterLockHeld.ToString(),readerWriterLock.IsReaderLockHeld.ToString(), );

            // mark that we have taken the lock
            IsTaken = true;
        }
        catch (Exception e)
        {
            throw new Exception(string.Format("Error getting lock {0} {1}", e.Message, Environment.StackTrace));
        }

        try
        {
           // do some work
        }
        finally
        {
            if (IsTaken)
            {
                readerWriterLock.ReleaseReaderLock();
            }
        }
    }

enter code here
John Saunders
  • 160,644
  • 26
  • 247
  • 397
Johnv2020
  • 2,088
  • 3
  • 20
  • 36
  • To be complete, can you add the IsTaken=true statements? – H H Apr 25 '11 at 11:09
  • I don't know if it explains things, but that code is plain wrong; the release ***must*** be part of the first try/finally - otherwise any exception in the actual work code will result in the lock not being released. – Marc Gravell Apr 25 '11 at 11:34
  • Marc - not sure I agree with you there. If any exception is thrown in the work code then the finally method must always be called prior to the application or thread terminating. So the lock should always be released. – Johnv2020 Apr 25 '11 at 12:04
  • 1
    @JohnV2020 - Based on your code, it appears as if you "do some work" outside the try-finally, as the comment is outside the try-finally. I'm assuming you mean you do the work inside the try block. – CodeNaked Apr 25 '11 at 12:13
  • Correct - work is carried out within the try/finally block – Johnv2020 Apr 25 '11 at 12:14
  • You are doing work (in both methods) regardless of whether you got the lock. Is that really what you want? – Fantius Apr 25 '11 at 14:38
  • @Fantius: There is a re-throw preventing that. – H H Apr 25 '11 at 15:21
  • Oh, right. BTW, you should use the Exception constructor that allows you to pass an inner Exception. As you have it, you are loosing the StackTrace information of the original exception. – Fantius Apr 25 '11 at 18:36
  • True, but that's the least of my worries right now. The big issue here is that read locks are getting acquired after a write lock has been taken. – Johnv2020 Apr 25 '11 at 19:22
  • You may try using the [ReaderWriterGate](http://msdn.microsoft.com/en-us/magazine/cc163532.aspx) or the OneManyResourceLock from the [Power Threading library](http://wintellect.com/PowerThreading.aspx), even if only to verify that it's an issue with the .NET ReaderWriterLock and not your code. – CodeNaked Apr 26 '11 at 12:24
  • Thanks - thats an excellent suggestion, didn't know about the readerwritergate or the resourcelock – Johnv2020 Apr 26 '11 at 17:07
  • Code looks good to me. Could you be possibly running in multiple App Domains? If so then there are multiple copies of the static readerWriterock. – Richard Schneider Apr 27 '11 at 00:57

2 Answers2

9

@All finally have a solution to this problem. @Yannick you were on the right track...

If MSDN says that it's impossible to have reader and writer lock held at same time.

Today I got confirmation from microsoft that in cases of very heavy load on multiprocessor systems (note: I could never reproduce this problem on an AMD system only on Intel) its possible for ReaderWriterLock class objects to become corrupted, the risk of this is increased if the numer of writers at any given stage grows as these can backup in the queue.

For the last two weeks I've been running using the .Net 3.5 ReaderWriterLockSlim class and have not encountered the issue, which corresponds to what Microsoft have confirmed that the readerwriterlockslim class does not have the same risk of corruption as the fat ReaderWriterLock class.

Johnv2020
  • 2,088
  • 3
  • 20
  • 36
3

If MSDN says that it's impossible to have reader and writer lock held at same time. Is it possible in your process to have 2 readerWriterLock objects at any time, for some other reason ?

Another thing strange, is that Debugging a thread using isWriterLockHeld, whereas the current thread is a reader one, don't allow you to know about writing within another thread. How do you know that Thread A still holds a writer lock, and how do you know that it's not the debug-Logging system that delay or "mix" instructions given by threads ?

Other thought, is it possible that other resource shared leads to a deadlock ? That would results somehow to a crash ? (while, Null Exception is still strange unless where consider the deadlock cleaned and readerWriterLock reset.

Your problem is strange, true.

And other question, that won't solve your problem. What do you use isTaken, whereas in debugging your application you rely on isWriterLockHeld (or isReaderLockHeld) ? why not use it in your finally blocks ?

Yannick
  • 133
  • 4