Locking on an interned string?

Question

Update: It is acceptable if this method is not thread safe, but I'm interested in learning how I would make it thread safe. Also, I do not want to lock on a single object for all values of key if I can avoid it.

Original Question: Suppose I want to write a higher order function that takes a key and a function, and checks if an object has been cached with the given key. If is has, the cached value is returned. Otherwise, the given function is run and the result is cached and returned.

Here's a simplified version of my code:

public static T CheckCache<T>(string key, Func<T> fn, DateTime expires)
{
    object cache = HttpContext.Current.Cache.Get(key);
    //clearly not thread safe, two threads could both evaluate the below condition as true
    //what can I lock on since the value of "key" may not be known at compile time?
    if (cache == null)
    {
        T result = fn();
        HttpContext.Current.Cache.Insert(key, result, null, expires, Cache.NoSlidingExpiration);
        return result;
    }
    else
        return (T)cache;
}

Also, suppose I do not know all possible values of key at compile time.

How can I make this thread safe? I know I need to introduce locking here, to prevent 1+ threads from evaluating my condition as true, but I don't know what to lock on. Many of the examples I've read about locking (such as Jon Skeet's article) recommend using a "dummy" private variable that's used only for locking. This isn't possible in this case, because keys are unknown at compile time. I know I could trivially make this thread safe by having the same lock be used for every key, but that could be wasteful.

Now, my main question is:

Is is possible to lock on key? Will string interning help here?

After reading .NET 2.0 string interning inside out, I understand that I can explicitly call String.Intern() to obtain a 1 to 1 mapping from the value of a string to instance of a string. Is this suitable to lock on? Let's change the above code to:

public static T CheckCache<T>(string key, Func<T> fn, DateTime expires)
{
    //check for the scenario where two strings with the same value are stored at different memory locations
    key = String.Intern(key); 
    lock (key) //is this object suitable for locking?
    {
        object cache = HttpContext.Current.Cache.Get(key);
        if (cache == null)
        {
            T result = fn();
            HttpContext.Current.Cache.Insert(key, result, null, expires, Cache.NoSlidingExpiration);
            return result;
        }
        else
            return (T)cache;
    }
}

Is the above implementation thread safe?

Evgeniy Berezovsky · Accepted Answer · 2017-10-27T00:11:52.843

Problems with @wsanville's own solution, partly mentioned before:

other parts of your code base might lock on the same interned string instances for different purposes, causing only performance issues, if lucky, and deadlocks if unlucky (potentially only in the future, as the code base grows, being extended by coders unaware of your String.Intern locking pattern) - note that this includes locks on the same interned string even if they are in different AppDomains, potentially leading to cross-AppDomain deadlocks
it's impossible for you to reclaim the interned memory in case you decided to do so
String.Intern() is slow

To address all these 3 issues, you could implement your own Intern() that you tie to your specific locking purpose, i.e. do not use it as a global, general-purpose string interner:

private static readonly ConcurrentDictionary<string, string> concSafe = 
    new ConcurrentDictionary<string, string>();
static string InternConcurrentSafe(string s)
{
    return concSafe.GetOrAdd(s, String.Copy);
}

I called this method ...Safe(), because when interning I will not store the passed in String instance, as that might e.g. be an already interned String, making it subject to the problems mentioned in 1. above.

To compare the performance of various ways of interning strings, I also tried the following 2 methods, as well as String.Intern.

private static readonly ConcurrentDictionary<string, string> conc = 
    new ConcurrentDictionary<string, string>();
static string InternConcurrent(string s)
{
    return conc.GetOrAdd(s, s);
}

private static readonly Dictionary<string, string> locked = 
    new Dictionary<string, string>(5000);
static string InternLocked(string s)
{
    string interned;
    lock (locked)
        if (!locked.TryGetValue(s, out interned))
            interned = locked[s] = s;
    return interned;
}

Benchmark

100 threads, each randomly selecting one of 5000 different strings (each containing 8 digits) 50000 times and then calling the respective intern method. All values after warming up sufficiently. This is Windows 7, 64bit, on a 4core i5.

N.B. Warming up the above setup implies that after warming up, there won't be any writes to the respective interning dictionaries, but only reads. It's what I was interested in for the use case at hand, but different write/read ratios will probably affect the results.

Results

String.Intern(): 2032 ms
InternLocked(): 1245 ms
InternConcurrent(): 458 ms
InternConcurrentSafe(): 453 ms

The fact that InternConcurrentSafe is as fast as InternConcurrent makes sense in light of the fact that these figures are after warming up (see above N.B.), so there are in fact no or only a few invocations of String.Copy during the test.

In order to properly encapsulate this, create a class like this:

public class StringLocker
{
    private readonly ConcurrentDictionary<string, string> _locks =
        new ConcurrentDictionary<string, string>();

    public string GetLockObject(string s)
    {
        return _locks.GetOrAdd(s, String.Copy);
    }
}

and after instantiating one StringLocker for every use case you might have, it is as easy as calling

lock(myStringLocker.GetLockObject(s))
{
    ...

N.B.

Thinking again, there's no need to return an object of type string if all you want to do is lock on it, so copying the characters is totally unnecessary, and the following would perform better than above class.

public class StringLocker
{
    private readonly ConcurrentDictionary<string, object> _locks =
        new ConcurrentDictionary<string, object>();

    public object GetLockObject(string s)
    {
        return _locks.GetOrAdd(s, k => new object());
    }
}

Whoa, thanks for pointing out the cross-AppDomain sharing, as well as a good counterexample. That article was pretty helpful. — wsanville, Oct 15 '13 at 17:01
Honestly, it's been about 2 years since I worked with .NET sites, so I don't exactly remember how we had our sites configured (with respect to AppDomains and processes). This means that if my locking solution was running in multiple websites on the same machine, then this could result in forcing a thread from site A to wait on a lock held by a thread in site B if they run in the same process, right? — wsanville, Oct 15 '13 at 17:05
Damn, that is scary indeed. Going with your answer for the accept, since you did the best job of providing a counterexample for this. Thanks! — wsanville, Oct 17 '13 at 18:06
I need to release lock object memory when all thread with same string exit the block because my strings are unlimited and im only need lock object when thread is on lock block. how can i achieve that? — Hamid, Feb 28 '15 at 17:08
@Hamid That has to be done differently, and merits its own question. If you create one and (and e.g. comment here so that I would notice) I will have an answer for you. — Evgeniy Berezovsky, Mar 02 '15 at 00:01
This looks like a very neat solution, however I believe it won't play nice with `GetOrAdd` as it is not quite thread-safe. You may end up creating two copies for the same string, which will allow you run concurrently code for the same string key, while you may expect that code to ran synchronously. See this topic for more info: https://stackoverflow.com/questions/10486579/concurrentdictionary-pitfall-are-delegates-factories-from-getoradd-and-addorup — Ivaylo Slavov, Nov 21 '20 at 15:42
@IvayloSlavov The fact that the code executed by `GetOrAdd()` may get executed concurrently, without synchronization, does not mean that this is not thread-safe. It is thread-safe, even when multiple strings (or Objects, in my final solution) get created, because the same string/Object instance will be returned by all concurrent GetOrAdd calls. (Only one instance will be used, the others will be discarded) — Evgeniy Berezovsky, Nov 24 '20 at 22:04
@EvgeniyBerezovsky, as I became aware of the `GetOrAdd ` issue, I tend to use it with caution -- especially because there are no safety guarantees of side effects for the `valueFactory` lambda. To be honest I never really thought of what the dictionary will do with each produced value -- will it return just the same instance or the last created will take over. It looked a lot like nondeterministic eventual consistency, and I have for long time assumed it to be so, in order to prevent myself on relying on it. Since then `GetOrAdd` fires a red light to me for some scenarios. — Ivaylo Slavov, Nov 26 '20 at 08:30
I suppose you are right, and the dictionary is deterministic in that it stores and consistently returns the same value besides the many factory calls -- then it is perfectly safe in the above situation and I am too wary about it. Nevertheless, I needed to make sure this is not a concern here — Ivaylo Slavov, Nov 26 '20 at 08:30

score 6 · Answer 2 · edited May 23 '17 at 12:09

A variant of Daniel's answer...

Rather than creating a new lock object for every single string you could share a small-ish set of locks, choosing which lock to use depending on the string's hashcode. This will mean less GC pressure if you potentially have thousands, or millions, of keys, and should allow enough granularity to avoid any serious blocking (perhaps after a few tweaks, if necessary).

public static T CheckCache<T>(string key, Func<T> fn, DateTime expires)
{
    object cached = HttpContext.Current.Cache[key];
    if (cached != null)
        return (T)cached;

    int stripeIndex = (key.GetHashCode() & 0x7FFFFFFF) % _stripes.Length;

    lock (_stripes[stripeIndex])
    {
        T result = fn();
        HttpContext.Current.Cache.Insert(key, result, null, expires,
                                         Cache.NoSlidingExpiration);
        return result;
    }
}

// share a set of 32 locks
private static readonly object[] _stripes = Enumerable.Range(0, 32)
                                                      .Select(x => new object())
                                                      .ToArray();

This will allow you to tweak the locking granularity to suit your particular needs just by changing the number of elements in the _stripes array. (However, if you need close to one-lock-per-string granularity then you're better off going with Daniel's answer.)

Very smart trade off between performance and memory. Also found this approach discussed here: http://blog.axantum.com/2008/01/lock-object-sharing-with-hashes.html — nickwesselman, Dec 06 '12 at 19:18
@techphoria414: The value returned from `GetHashCode` can be positive or negative, but we can't index into the `_stripes` array using a negative index. *And*-ing with `0x7FFFFFFF` will ensure that the index is always positive. — LukeH, Dec 07 '12 at 11:01
Love this - definitely addresses the more common ConcurrentDictionary approach's pitfall about consuming a bunch of memory over time. +1 — Matt Whitfield, Mar 21 '18 at 18:49

Oded · Answer 3 · 2011-08-08T14:41:57.683

3

Never lock on strings. In particular on those that are interned. See this blog entry on the danger of locking on interned strings.

Just create a new object and lock on that:

object myLock = new object();

edited Aug 08 '11 at 14:41

answered Aug 08 '11 at 14:35

Oded

489,969
99
883
1,009

2

I want to avoid locking on the same object for every key. I don't want my method to block everything for one long-running and potentially unrelated method to finish. – wsanville Aug 08 '11 at 14:40
@wsanville normally you don't have parameterised locking - it's a recipe for deadlocks in future where you can't see what's locked and why. The idea normally is to lock access to something singular, in your case the Cache. If you need specific locking for certain calls, then I'd suggest putting these calls inside their own code path and locking access to that path separately, but still you are only locking for one thing. – Adam Houldsworth Aug 08 '11 at 14:45
1

So I read that article, and from my understanding, the only danger is the misuse of locking on strings and not understanding string interning. `lock (LockB)` and `lock (LockC)` look like two different objects, but in that example they are the same due to string interning. This is the behavior I'm looking for, and I'm curious to know if there's anything I'm overlooking. – wsanville Aug 08 '11 at 15:01

score 3 · Answer 4 · edited Jan 29 '13 at 20:50

3

According to the documentation, the Cache type is thread safe. So the downside for not synchronizing yourself is that when the item is being created, it may be created a few times before the other threads realize they don't need to create it.

If the situation is simply to cache common static / read-only things, then don't bother synchronizing just to save the odd few collisions that might occur. (Assuming the collisions are benign.)

The locking object won't be specific to the strings, it will be specific to the granularity of the lock you require. In this case, you are trying to lock access to the cache, so one object would service locking the cache. The idea of locking on the specific key that comes in isn't the concept locking is usually concerned with.

If you want to stop expensive calls from occurring multiple times, then you can rip the loading logic out into a new class LoadMillionsOfRecords, call .Load and lock once on an internal locking object as per Oded's answer.

edited Jan 29 '13 at 20:50

Nathaniel Ford

20,545
20
91
102

answered Aug 08 '11 at 14:37

Adam Houldsworth

63,413
11
150
187

What do you mean by "The locking object won't be specific to the strings, it will be specific to the granularity of the lock you require" ? I don't want 1 lock for everything, I'm looking for 1 lock for each potential value of `key`. Also, I should mention that it's not the end of the world if it's not thread safe, I'm just curious about how I would go about implementing it. – wsanville Aug 08 '11 at 15:09
@wsanville I was trying to say you lock access to a resource, in your case Cache. However you then mention that some keys are expensive and some aren't. I'd steer well clear of locking on an argument to a method personally, and make the code path more explicit for known expensive calls - you can then only have one lock in that path for loading that expensive data. – Adam Houldsworth Aug 08 '11 at 15:41

Daniel Hilgarth · Answer 5 · 2014-10-29T21:02:21.933

3

I would go with the pragmatic approach and use the dummy variable.
If this is not possible for whatever reason, I would use a Dictionary<TKey, TValue> with key as the key and a dummy object as the value and lock on that value, because strings are not suitable for locking:

private object _syncRoot = new Object();
private Dictionary<string, object> _syncRoots = new Dictionary<string, object>();

public static T CheckCache<T>(string key, Func<T> fn, DateTime expires)
{
    object keySyncRoot;
    lock(_syncRoot)
    {

        if(!_syncRoots.TryGetValue(key, out keySyncRoot))
        {
            keySyncRoot = new object();
            _syncRoots[key] = keySyncRoot;
        }
    }
    lock(keySyncRoot)
    {

        object cache = HttpContext.Current.Cache.Get(key);
        if (cache == null)
        {
            T result = fn();
            HttpContext.Current.Cache.Insert(key, result, null, expires, 
                                             Cache.NoSlidingExpiration);
            return result;
        }
        else
            return (T)cache;
    }
}

However, in most cases this is overkill and unnecessary micro optimization.

edited Oct 29 '14 at 21:02

answered Aug 08 '11 at 14:37

Daniel Hilgarth

171,043
40
335
443

2

+1, but depending on the exact usage pattern that could potentially be a lot of objects that will never be collected. – LukeH Aug 08 '11 at 14:48
@LukeH: Correct. A cleanup routine would be needed additionally. – Daniel Hilgarth Aug 08 '11 at 14:50
I've thought about doing exactly that, having a dictionary of lock objects, and lock on access to that dictionary. I'm curious if I can get by without implementing it, because it does seem like overkill, and it's not the end of the world if 1+ threads end up duplicating work. – wsanville Aug 08 '11 at 14:58
@wsanville: If some duplicate work isn't the end of the world and/or it won't happen very often then I wouldn't bother with locking at all. – LukeH Aug 08 '11 at 15:57
@DanielHilgarth - why does your 2nd lock use a local variable? Local variables are guaranteed to be thread-safe. I can't see how this accomplishes anything. I must be missing something. – Dave Black Oct 29 '14 at 20:59
@DaveBlack: Have a look at where the value of this variable is coming from. It is not always a new instance. – Daniel Hilgarth Oct 29 '14 at 21:02

score 0 · Answer 6 · edited May 23 '17 at 12:26

I added a solution in Bardock.Utils package inspired by @eugene-beresovsky answer.

Usage:

private static LockeableObjectFactory<string> _lockeableStringFactory = 
    new LockeableObjectFactory<string>();

string key = ...;

lock (_lockeableStringFactory.Get(key))
{
    ...
}

Solution code:

namespace Bardock.Utils.Sync
{
    /// <summary>
    /// Creates objects based on instances of TSeed that can be used to acquire an exclusive lock.
    /// Instanciate one factory for every use case you might have.
    /// Inspired by Eugene Beresovsky's solution: https://stackoverflow.com/a/19375402
    /// </summary>
    /// <typeparam name="TSeed">Type of the object you want lock on</typeparam>
    public class LockeableObjectFactory<TSeed>
    {
        private readonly ConcurrentDictionary<TSeed, object> _lockeableObjects = new ConcurrentDictionary<TSeed, object>();

        /// <summary>
        /// Creates or uses an existing object instance by specified seed
        /// </summary>
        /// <param name="seed">
        /// The object used to generate a new lockeable object.
        /// The default EqualityComparer<TSeed> is used to determine if two seeds are equal. 
        /// The same object instance is returned for equal seeds, otherwise a new object is created.
        /// </param>
        public object Get(TSeed seed)
        {
            return _lockeableObjects.GetOrAdd(seed, valueFactory: x => new object());
        }
    }
}

Locking on an interned string?

6 Answers6

Linked