2

I found some multithreaded code in the quite popular LazyCache library, that uses an int[] field as a granular locking mechanism, with the intention to prevent concurrent invocation of a method with the same key as argument. I am highly skeptical about the correctness of this code, because there is no Interlocked or Volatile operation used when exiting the protected region. Here is the important part of the code:

private readonly int[] keyLocks;

public virtual T GetOrAdd<T>(string key, Func<ICacheEntry, T> addItemFactory,
     MemoryCacheEntryOptions policy)
{
    /* Do stuff */

    object cacheItem;
    // acquire lock per key
    uint hash = (uint)key.GetHashCode() % (uint)keyLocks.Length;
    while (Interlocked.CompareExchange(ref keyLocks[hash], 1, 0) == 1) Thread.Yield();
    try
    {
        cacheItem = CacheProvider.GetOrCreate<object>(key, CacheFactory);
    }
    finally
    {
        keyLocks[hash] = 0;
    }

    /* Do more stuff */
}

Link to the source code.

The protected method call is the CacheProvider.GetOrCreate<object>(key, CacheFactory). It is supposed to be called by one thread at a time, for the same key. For entering the protected region there is while loop that uses the Interlocked.CompareExchange to change a value of the keyLocks array from 0 to 1. So far so good. The part that concerns me is the line that exits the protected region: keyLocks[hash] = 0;. Since there is no barrier there, my understanding is that the C# compiler and the .NET Jitter are free to move instructions in either direction, stepping over this line. So an instruction inside the CacheProvider.GetOrCreate method can be moved after the keyLocks[hash] = 0;.

My question is: according to the specs, does the code above really ensure that the CacheProvider.GetOrCreate will not be called concurrently with the same key? Is the promise of mutual exclusion fulfilled by this code? Or the code is just buggy?

Context: The relevant code was added in the library in this pull request: Optimize cache to lock per key.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104

1 Answers1

3

Looks buggy to me; the keyLocks[hash] = 0; is not a release store so parts of Do stuff can reorder out of the critical section, potentially becoming visible to another thread only after it acquires the lock.

(Potentially reading already-modified data, or more likely having stores appear late and step on stores from the next thread, or not be seen by its loads.)

It will very likely compile to correct asm on x86, where all asm stores have "release" semantics so only compile-time reordering could break things, but not on ARM / AArch64 or other mainstream ISAs that are weakly ordered. So testing on x86 can't reveal this bug unless you actually do get compile-time reordering. (It's still broken, the bug is just dormant.)

https://preshing.com/20121019/this-is-why-they-call-it-a-weakly-ordered-cpu/ demos a spinlock in C++ that uses relaxed instead of acquire / release, and that it breaks in practice on ARM. That example is exactly like this, except here the CAS is like C++ memory_order_seq_cst so the top of the critical section is strong enough. But that's not sufficient; stronger ordering for taking the lock doesn't save you from too weak an unlock.


A basic spinlock needs an acquire RMW to get exclusive ownership, and a release store to unlock, hence the names. That's sufficient to keep Do stuff contained inside the critical section in that direction.

In C#, a release store can be done with Volatile.Write, or via assignment to a volatile object. My understanding is that those are equivalent to C++ foo.store(val, std::memory_order_release).

Related x86 asm examples and spinlock discussion:

  • Spinlock with XCHG unlocking (the unlock does not need to be an atomic RMW, and does not need to be as strong a barrier as Interlocked.Exchange, but does need to be release)
  • Locks around memory manipulation via inline assembly - hand-written x86 asm, with some discussion of attempting to take the lock first and then spinning read-only vs. starting with a read-only access which optimized for the contended case. The naive lock here keeps spamming atomic CAS attempts even when we haven't seen any evidence that it might be available. It uses Thread.Yield() instead of SpinWait.SpinOnce(), which might be good if you have more threads than cores and critical sections tend to take a long time to unlock.
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Thanks Peter! A couple of hours ago I was about to submit a pull request for replacing the `keyLocks[hash] = 0;` (in four places) with `Interlocked.Exchange(ref keyLocks[hash], 0);`, but then I started having second thoughts. Maybe the existing code was actually safe, for some non-obvious reason. Thanks for clearing this out. Now I am more inclined to propose them to replace the buggy `int[] keyLocks;` with an `object[] keyLocks;`, and use the normal `lock` instead of the fancy `Interlocked`+`Thread.Yield`. – Theodor Zoulias Mar 14 '23 at 03:30
  • @TheodorZoulias: it's a table of spinlocks, using the same idea as [C++ `std::atomic` for objects so large that it can't be lock-free](https://stackoverflow.com/questions/50298358/where-is-the-lock-for-a-stdatomic). If you use the `lock` keyword you don't need a `keyLocks` array at all to protect `key` and other stuff that happens inside the `try{}`. Unless that's still a good way to get a bunch of different lockable objects to let different keys run concurrently, except when their `hash % n` collides. – Peter Cordes Mar 14 '23 at 03:35
  • @TheodorZoulias: If you use `lock` in a way that gives mutual exclusion so only one thread can be in the `try{}` block at any given time, you're reducing potential parallelism a lot. That's the reason the original had a table of spinlocks instead of just one. (Sorry, I don't know C# semantics, whether `lock` applies to a block or an object. If an object, and you can't put it into the internals of the `CacheProvider` on the actual hash bucket, then sure an array of objects that you `lock` as a proxy for locking the actual hash bucket is ok.) – Peter Cordes Mar 14 '23 at 03:37
  • I mean `lock (keyLocks[hash])`, after adding a `new object()` in each slot of the `keyLocks` array. I think it should have the same effect with the existing code, only not buggy. – Theodor Zoulias Mar 14 '23 at 03:39
  • 1
    @TheodorZoulias: I assume that should work, and might be a good approach. – Peter Cordes Mar 14 '23 at 03:40
  • I just submitted a [pull request](https://github.com/alastairtree/LazyCache/pull/187 "Improve the locking per key logic in the CachingService") to the alastairtree/LazyCache repository. – Theodor Zoulias Mar 14 '23 at 04:22