7

I have an ASP.NET site with a fairly slow search function, and I want to improve performance by adding the results to the cache for one hour using the query as the cache-key:

using System;
using System.Web;
using System.Web.Caching;

public class Search
{
    private static object _cacheLock = new object();

    public static string DoSearch(string query)
    {
        string results = "";

        if (HttpContext.Current.Cache[query] == null)
        {
            lock (_cacheLock)
            {
                if (HttpContext.Current.Cache[query] == null)
                {
                    results = GetResultsFromSlowDb(query);

                    HttpContext.Current.Cache.Add(query, results, null, DateTime.Now.AddHours(1), Cache.NoSlidingExpiration, CacheItemPriority.Normal, null);
                }
                else
                {
                    results = HttpContext.Current.Cache[query].ToString();
                }
            }
        }
        else
        {
            results = HttpContext.Current.Cache[query].ToString();
        }

        return results;
    }

    private static string GetResultsFromSlowDb(string query)
    {
        return "Hello World!";
    }
}

Let’s say visitor A does a search. The cache is empty, the lock is set and the result is requested from the database. Now visitor B comes along with a different search: Won’t visitor B have to wait by the lock until visitor A’s search has completed? What I really wanted was for B to call the database immediately, because the results will be different and the database can handle multiple requests – I just don’t want to repeat expensive unnecessary queries.

What would be the correct approach for this scenario?

Jakob Gade
  • 12,319
  • 15
  • 70
  • 118
  • 2
    Are the queries really so expensive and/or your site so busy that you can't afford a few redundant duplicate queries once per hour? (And that situation would only arise if, and only if, two or more queries hit your method almost simultaneously once the cache has expired.) – LukeH Apr 07 '11 at 09:22
  • If your database does not support multiple read access, you can implement a message query, so DB serves A, then DB serves B... while serving, check the cache. – Winfred Apr 07 '11 at 09:28
  • @LukeH, There's so much going on in that particular database, so any load we can take off it is worth the effort. – Jakob Gade Apr 07 '11 at 09:28
  • Is there only one web server? If not, you could use a distributed cache to greater effect. – Chris Pitman Apr 07 '11 at 12:35
  • @Chris, just a single webserver so far. – Jakob Gade Apr 07 '11 at 13:12

3 Answers3

26

Unless you're absolutely certain that it's critical to have no redundant queries then I would avoid locking altogether. The ASP.NET cache is inherently thread-safe, so the only drawback to the following code is that you might temporarily see a few redundant queries racing each other when their associated cache entry expires:

public static string DoSearch(string query)
{
    var results = (string)HttpContext.Current.Cache[query];
    if (results == null)
    {
        results = GetResultsFromSlowDb(query);

        HttpContext.Current.Cache.Insert(query, results, null,
            DateTime.Now.AddHours(1), Cache.NoSlidingExpiration);
    }
    return results;
}

If you decide that you really must avoid all redundant queries then you could use a set of more granular locks, one lock per query:

public static string DoSearch(string query)
{
    var results = (string)HttpContext.Current.Cache[query];
    if (results == null)
    {
        object miniLock = _miniLocks.GetOrAdd(query, k => new object());
        lock (miniLock)
        {
            results = (string)HttpContext.Current.Cache[query];
            if (results == null)
            {
                results = GetResultsFromSlowDb(query);

                HttpContext.Current.Cache.Insert(query, results, null,
                    DateTime.Now.AddHours(1), Cache.NoSlidingExpiration);
            }

            object temp;
            if (_miniLocks.TryGetValue(query, out temp) && (temp == miniLock))
                _miniLocks.TryRemove(query);
        }
    }
    return results;
}

private static readonly ConcurrentDictionary<string, object> _miniLocks =
                                  new ConcurrentDictionary<string, object>();
LukeH
  • 263,068
  • 57
  • 365
  • 409
  • Awesome. I'll see if I can cook something similar up for .NET 3.5 (ConcurrentDictionary is only supported in .NET 4). But I'll probably go with your first suggestion until we upgrade. Thanks. :) – Jakob Gade Apr 07 '11 at 12:35
  • @LukeH other than space if there are tons of different queries, is it really needed to remove from _miniLocks? – eglasius Jan 25 '12 at 21:29
  • @eglasius: No, it's just an attempt to free up space. – LukeH Jan 26 '12 at 00:48
  • cool, interesting solution btw (concurrentdictionary to keep locks x key) – eglasius Jan 26 '12 at 03:07
  • 3
    +1 for emphasizing what nearly every cache-locking example seems to neglect: proper lock selection. It's access to one particular cached item that you want to lock, not the entire cache! – Todd Menier Sep 30 '12 at 02:07
  • 1
    Please be aware that the GetOrAdd method has an overload with a delegate and one with an object. It seems the delegate version has different locking rules for thread safety. Reference: https://msdn.microsoft.com/en-us/library/dd287191(v=vs.110).aspx#Anchor_7 – BenMaddox Aug 23 '16 at 14:24
9

Your code has a potential race condition:

if (HttpContext.Current.Cache[query] == null)         
{   
    ...
}         
else         
{
    // When you get here, another thread may have removed the item from the cache
    // so this may still return null.
    results = HttpContext.Current.Cache[query].ToString();         
}

In general I wouldn't use locking, and would do it as follows to avoid the race condition:

results = HttpContext.Current.Cache[query];
if (results == null)         
{   
    results = GetResultsFromSomewhere();
    HttpContext.Current.Cache.Add(query, results,...);
}
return results;

In the above case, multiple threads might attempt to load data if they detect a cache miss at about the same time. In practice this is likely to be rare, and in most cases unimportant, because the data they load will be equivalent.

But if you want to use a lock to prevent it you can do so as follows:

results = HttpContext.Current.Cache[query];
if (results == null)         
{   
    lock(someLock)
    {
        results = HttpContext.Current.Cache[query];
        if (results == null)
        {
            results = GetResultsFromSomewhere();
            HttpContext.Current.Cache.Add(query, results,...);
        }           
    }
}
return results;
Joe
  • 122,218
  • 32
  • 205
  • 338
0

Your code is correct. You are also using double-if-sandwitching-lock which will prevent race conditions which is a common pitfall when not used. This will no lock access to existing stuff in the cache.

The only problem is when many clients are inserting into the cache at the same time, and they will queue behind the lock but what I would do is to put the results = GetResultsFromSlowDb(query); outside the lock:

public static string DoSearch(string query)
{
    string results = "";

    if (HttpContext.Current.Cache[query] == null)
    {
        results = GetResultsFromSlowDb(query); // HERE
        lock (_cacheLock)
        {
            if (HttpContext.Current.Cache[query] == null)
            {


                HttpContext.Current.Cache.Add(query, results, null, DateTime.Now.AddHours(1), Cache.NoSlidingExpiration, CacheItemPriority.Normal, null);
            }
            else
            {
                results = HttpContext.Current.Cache[query].ToString();
            }
        }
    }
    else
    {
        results = HttpContext.Current.Cache[query].ToString();
    }

If this is slow, your problem is elsewhere.

Aliostad
  • 80,612
  • 21
  • 160
  • 208
  • Thanks. But are you sure visitor B won't have to wait until visitor A is finished? – Jakob Gade Apr 07 '11 at 09:32
  • 2
    Won't moving the GetResultsFromSlowDb outside the lock defeat the purpose of the double-cache-check? Multiple visiors can start the same query, if they enter before the first visitor is done. – Jakob Gade Apr 07 '11 at 09:36
  • No. Your cache stays there for an hour. If two clients try getting exactly the **same keys**, inside or outside the lock, they have wait, only your DB is called less. But this will allow clients happily get **different keys** and not having to wait. – Aliostad Apr 07 '11 at 09:42
  • 2
    @Aliostad: The lock serves no useful purpose in your edit. All that happens is that you discard the results from the redundant queries after the expense of running them. You'd be better off getting rid of the lock and just letting the redundant queries all update the cache as they complete. (The ASP.NET cache is threadsafe.) – LukeH Apr 07 '11 at 09:44
  • Of course it does. Let's say I am going after key 'foo' and you are going after key 'bar'. If you go first then my query for key 'foo' does not run in parallel while your query runs, since I am behind the lock. That is the whole point, the queries now can run in parallel, so be it that some query results won't be used. – Aliostad Apr 07 '11 at 09:46
  • 1
    @Aliostad: Exactly. So what does having a lock do that wouldn't happen without a lock. The only difference that I can see is the with the lock the first result for "foo" will be in the cache after all the racing queries have completed; without the lock then the last result for "foo" will be in the cache after all the racing queries have completed. (In the second case each result will be added to the cache as it completes, and remain there until the next result overwrites it, but that will all happen in an atomic, thread-safe manner.) – LukeH Apr 07 '11 at 09:56
  • Not a lot to be honest :) Taking a write lock is usually good practice when it comes to caches but here since the data is the same, it would not make a difference – Aliostad Apr 07 '11 at 10:05
  • I really appreciate the discussion, but I don’t think my original question was answered: Will B have to wait until A is finished? If yes, then I’d prefer to skip the locking and just take the extra (unlikely) hit on the database as suggested by @LukeH in the question comments. That would be preferable over having B waiting for A. – Jakob Gade Apr 07 '11 at 10:24
  • @Jakob as I said using oyour original, yes, it would wait but using mine - or without lock - no. – Aliostad Apr 07 '11 at 10:26