0

Currently I'm dealing with a multi thread issue on a server program with .NET 6.0. The method with issue likes below:

private async Task<Cell> LoadCell()
{
    if (_loadingLock.TryEnterUpgradeableReadLock(new TimeSpan(0, 0, 0, 0, 20)))
    {
        _globalReadlockCountEnter++;
        Log.Info("_loadingLock Enter upgrade readlock, count: " + _globalReadlockCountEnter);
        try
        {
            if (_mapLock.TryEnterReadLock(new TimeSpan(0, 0, 0, 0, 20)))
            {
                try
                {
                    //Try get value
                }
                finally
                {
                    _mapLock.ExitReadLock();
                }
            }
            else
            {
                throw new TimeoutException("Timeout trying to get read lock in level cell manager.");
            }

            bool acquired = false;
            try
            {
                if (!_loadingLock.TryEnterWriteLock(new TimeSpan(0, 0, 0, 20)))
                {
                    throw new TimeoutException("Timeout trying to get write lock in level cell manager.");
                }
                else
                {
                    acquired = true;
                    _globalWritelockCountEnter++;
                    Log.Info("_loadingLock Enter writelock, count: " + _globalWritelockCountEnter);
                    //deal with certain value
                }
            }
            catch (Exception e)
            {
                throw new Exception("Unknown exception error.");
            }
            finally
            {
                if (acquired)
                {
                    if (!_loadingLock.IsWriteLockHeld)
                    {
                        Log.Warn("write lock has not been held, check is everything alright?");
                    }

                    _loadingLock.ExitWriteLock();
                    _globalWritelockCountExit++;
                    Log.Info("_loadingLock exit writelock, count: " + _globalWritelockCountExit);
                }
                else
                {
                    Log.Warn("acquire is false some how it failed to exit the write lock.");
                }
            }
        }
        finally
        {
            if (!_loadingLock.IsUpgradeableReadLockHeld)
            {
                Log.Warn("upgrade read lock has not been held, check is everything alright?");
            }

            _loadingLock.ExitUpgradeableReadLock();
            _globalReadlockCountExit++;
            Log.Info("_loadingLock Exit upgrade readlock, count: " + _globalReadlockCountExit);
        }
    }
    else
    {
        throw new TimeoutException("Timeout trying to get upgrade read lock in level cell manager.");
    }
}

The logs and error messages are like:

2023-07-19 07:43:48,601 INFO _loadingLock Exit upgrade readlock, count: 1060
2023-07-19 07:43:48,601 INFO _loadingLock Enter upgrade readlock, count: 1061
2023-07-19 07:43:48,601 INFO _loadingLock Enter writelock, count: 358
2023-07-19 07:43:48,602 INFO _loadingLock exit writelock, count: 358
2023-07-19 07:43:48,602 INFO _loadingLock Exit upgrade readlock, count: 1061
2023-07-19 07:43:48,602 INFO _loadingLock Enter upgrade readlock, count: 1062
2023-07-19 07:43:48,602 INFO _loadingLock Enter writelock, count: 359
2023-07-19 07:43:48,603 INFO _loadingLock exit writelock, count: 359
2023-07-19 07:43:48,603 INFO _loadingLock Exit upgrade readlock, count: 1062
2023-07-19 07:43:48,603 INFO _loadingLock Enter upgrade readlock, count: 1063
2023-07-19 07:43:48,603 INFO _loadingLock Exit upgrade readlock, count: 1063
2023-07-19 07:43:48,604 INFO _loadingLock Enter upgrade readlock, count: 1064
2023-07-19 07:43:48,604 INFO _loadingLock Exit upgrade readlock, count: 1064
2023-07-19 07:43:48,604 INFO _loadingLock Enter upgrade readlock, count: 1065
2023-07-19 07:43:48,604 INFO _loadingLock Exit upgrade readlock, count: 1065
...
2023-07-19 07:43:48,604 INFO _loadingLock Enter upgrade readlock, count: 1080
2023-07-19 07:43:48,604 INFO _loadingLock Exit upgrade readlock, count: 1080
2023-07-19 07:43:48,604 INFO _loadingLock Enter upgrade readlock, count: 1081
2023-07-19 07:43:48,604 INFO _loadingLock Exit upgrade readlock, count: 1081
2023-07-19 07:43:48,604 INFO _loadingLock Enter upgrade readlock, count: 1082
2023-07-19 07:43:48,604 INFO _loadingLock Enter writelock, count: 360
2023-07-19 07:43:48,623 WARN write lock has not been held, check is everything alright?
2023-07-19 07:43:48,623 WARN upgrade read lock has not been held, check is everything alright?
Unhandled exception. Unhandled exception. System.Threading.SynchronizationLockException: The upgradeable lock is being released without being held.

The issue seems happened occasionally when the method is being called, sometimes it can keep two days without any issue and sometimes it happens just after the server start for a while, and every time before it happens, all enter/exit lock could be paired. I tried spamming the logs to check what happened, and as you can see above, I finally found it's always the write lock doesn't get exit(or at least I didn't see the log when it should exit) and then when it goes the exit upgradeable read lock, exception throws.

I thought from the code the workflow should be alright that enter upgradeable readlock -> enter writelock -> exit writelock -> exit upgradeable readlock and it shouldn't cause any trouble as the example mentioned in official documentation, but in actual case, I'm not sure whether it's something I did wrong or it concerns something of base level which I'm not familiar with. As I understood, the upgradeable readlock should be something that deals with the race condition, so the lock shouldn't be released by another thread. Also I know that for IsXXXLockHeld property, as mentioned in official docs: This property is intended for use in asserts or for other debugging purposes. Do not use it to control the flow of program execution(Mention this here as I've seen that in some simular issues that some one recommand using this for flow control).

I've tried using both try or none-try enter lock method but the result is usually same, well, if uses none-try enter lock it sometimes throw exceptions at the enterUpgradeableReadlock step, give me a feeling that previous lock has not exited.

Not sure what to do know and any tip or help is appreciated.

Eric MA
  • 11
  • 2
  • 1
    I'm immediately wary of any method marked `async` that is using locking primitives tied to thread identity. However, with no `await`s present in the cutdown sample, it's tricky to say it's *definitely* the issue – Damien_The_Unbeliever Jul 20 '23 at 08:09
  • See e.g. https://stackoverflow.com/q/19659387/15498 – Damien_The_Unbeliever Jul 20 '23 at 08:17
  • Hi, thanks for the tips. Actually there is an await in the //deal with certain value part, it looks like: while (_loadingReferences.Contains(coord) && maxIter-- > 0) {await Task.Delay(500); } with a comment: "// Another thread has already requested this object. Wait for a while. " I have viewed that post, but I guess that the upgradeable lock should handle the async case? – Eric MA Jul 20 '23 at 09:58
  • No, the problem is that after your code resumes after that `await`, you may not be on the same thread still - the thread you're now on may not have interacted with the lock at all and, even if it had, by definition we would know (if you have moved threads) that you're not on the thread that asked for the upgradeable lock. – Damien_The_Unbeliever Jul 20 '23 at 12:52
  • @Charlieface Sorry I didn't make it clear. All the read/write locks here are from ReaderWriterLockSlim. Actually I thought people will check the official doc with hyperlink which I mentioned in the op. – Eric MA Jul 21 '23 at 01:30
  • @Damien_The_Unbeliever So... as Charlieface mentioned that ReaderWriterLockSlim which I'm actually using here(Sorry again for forget to mention that), does it make any difference? I know race condition between threads is a hard part to deal with but I think there should be an official solution? As I said, actually I'm following the official documentation(with hyperlink) in origin post. – Eric MA Jul 21 '23 at 01:33
  • 1
    Did you follow the link I already posted to another Q&A here? One specifically asking about `ReaderWriterLockSlim` and `async`/`await`? There are three specific suggestions in the accepted answer and *none* of them are "using a `ReaderWriterLockSlim` is absolutely fine." – Damien_The_Unbeliever Jul 21 '23 at 05:09
  • hmm... maybe you're right. Let me try to remove that await then check. – Eric MA Jul 21 '23 at 06:26
  • I take it back, I actually meant `SemaphoreSlim` supports `await` although that would need some work to convert into a full `ReaderWriterLock` style – Charlieface Jul 21 '23 at 09:35

1 Answers1

1

So it turns out that both upgrade read/write lock and the normal read/write lock are not async-affine(though I thought the upgrade locks should be the solution of this issue). After some researches I think the SemaphoreSlim is the right direction to solve this, however, that will need a rework for me. If any one who met same issue and don't want a rework on this, you can try AsyncReaderWriterLockSlim which I found on Github, internally using SemaphoreSlim to make async read/write locks.

Eric MA
  • 11
  • 2