8

I'm using StackExchange.Redis (SE.R henceforth) in my Nancy application. There is a single global ConnectionMultiplexer that gets automatically passed around by Nancy's TinyIoC via constructor parameters, and any time I try and use GetDatabase and one of the *Async methods (sync methods only start failing after one of the async methods have been attempted) my application deadlocks.

Looking at my parallel stacks it appears that I have four threads:

  1. The thread that called Result on one of my Tasks that uses SE.R. (Has lots of Nancy stuff on the stack, then a call to my library that utilizes SE.R, and a call to Result. Top of the stack is Monitor.Wait).
  2. A thread that spawned two other threads. I assume this is managed by SE.R. Starts with Native to Managed Transition, ThreadHelper.ThreadStart, and at the top of the stack is ThreadHelper.ThreadStart_Context.
  3. A small stack that is stuck like this:
    • Monitor.Wait
    • Monitor.Wait
    • SocketManager.WriteAllQueues
    • SocketManager.cctor.AnonymousMethod__16
  4. Another small stack that looks like this:
    • Managed to Native Transition
    • SocketManager.ReadImpl
    • SocketManager.Read
    • SocketManager.cctor.AnonymousMethod__19

I'm almost sure this is a deadlock of some sort. I even think it might have something to do with this question. But I have no idea what to do about it.

The ConnectionMultiplexer is set up in a Nancy IRegistrations with the following code:

var configOpts =  new ConfigurationOptions {
  EndPoints = {
    RedisHost,
  },
  Password = RedisPass,
  AllowAdmin = false,
  ClientName = ApplicationName,
  ConnectTimeout = 10000,
  SyncTimeout = 5000,
};
var mux = ConnectionMultiplexer.Connect(configOpts);
yield return new InstanceRegistration(typeof (ConnectionMultiplexer), mux);

mux is the instance that gets shared by all code requesting it in their constructor parameter list.

I have a class called SchemaCache. A small piece of it (that includes the code that throws the error in question) follows:

public SchemaCache(ConnectionMultiplexer connectionMultiplexer) {
    ConnectionMultiplexer = connectionMultiplexer;
}

private ConnectionMultiplexer ConnectionMultiplexer { get; set; }

private async Task<string[]> Cached(string key, bool forceFetch, Func<string[]> fetch) {
    var db = ConnectionMultiplexer.GetDatabase();

    return forceFetch || !await db.KeyExistsAsync(key)
        ? await CacheSetSet(db, key, await Task.Run(fetch))
        : await CacheGetSet(db, key);
}

private static async Task<string[]> CacheSetSet(IDatabaseAsync db, string key, string[] values) {
    await db.KeyDeleteAsync(key);
    await db.SetAddAsync(key, EmptyCacheSentinel);

    var keysSaved = values
        .Append(EmptyCacheSentinel)
        .Select(val => db.SetAddAsync(key, val))
        .ToArray()
        .Append(db.KeyExpireAsync(key, TimeSpan.FromDays(1)));
    await Task.WhenAll(keysSaved);

    return values;
}

private static async Task<string[]> CacheGetSet(IDatabaseAsync db, string key) {
    var results = await db.SetMembersAsync(key);
    return results.Select(rv => (string) rv).Without(EmptyCacheSentinel).ToArray();
}

// There are a bunch of these public methods:
public async Task<IEnumerable<string>> UseCache1(bool forceFetch = false) {
    return await Cached("the_key_i_want", forceFetch, () => {
        using (var cnn = MakeConnectionToDatabase("server", "databaseName")) {
            // Uses Dapper:
            return cnn.Query<string>("--expensive sql query").ToArray();
        }
    });
}

I also have a class that makes uses of this in a method requiring some of the information from the cache:

public OtherClass(SchemaCache cache) {
    Cache = cache;
}

private SchemaCache Cache { get; set; }

public Result GetResult(Parameter parameter) {
    return Cache.UseCache1().Result
        .Where(r => Cache.UseCache2(r).Result.Contains(parameter))
        .Select(r => CheckResult(r))
        .FirstOrDefault(x => x != null);
}

All of the above works fine in LinqPad where there's only one instance of everything in question. Instead it fails with a TimeoutException (and later an exception about no available connection). The only difference is that I get an instance of the cache via dependency injection, and I'm pretty sure Nancy uses Tasks to parallelize the requests.

Community
  • 1
  • 1
Chris Pfohl
  • 18,220
  • 9
  • 68
  • 111
  • Still trying to summarize. – Chris Pfohl Dec 01 '14 at 19:40
  • Are you by any chance using a transaction or batch? There's a really simple way to deadlock yourself there (that is just as easily fixed) – Marc Gravell Dec 01 '14 at 20:08
  • @MarcGravell Nope...still working up the rest of the example...there's a lot of moving pieces... – Chris Pfohl Dec 01 '14 at 20:14
  • Those two stacks are the socket IO stubs; they mean nothing (or rather: they are normal and expected); what would be more useful is the output of `GetCounters()` – Marc Gravell Dec 01 '14 at 20:18
  • `GetCounters` from which class? – Chris Pfohl Dec 01 '14 at 20:20
  • {Total: int ops=13, qu=0, qs=0, qc=0, wr=0, sync=10, async=3, socks=1; sub ops=4, qu=0, qs=0, qc=0, wr=0, subs=1, sync=4, socks=1} EndPoint: null Interactive: {ops=13, qu=0, qs=0, qc=0, wr=0, sync=10, async=3, socks=1} Other: {ops=0, qu=0, qs=0, qc=0, wr=0} Subscription: {ops=4, qu=0, qs=0, qc=0, wr=0, subs=1, sync=4, socks=1} TotalOutstanding: 0 – Chris Pfohl Dec 01 '14 at 20:30
  • That is after the method gets called (and stuck) once, then I press `Post` again on my form so a second request starts up. I can't seem to figure out how to get access to the mux otherwise... – Chris Pfohl Dec 01 '14 at 20:32
  • 1
    @ChristopherPfohl that basically says the muxer isn't active in any way; everything has completed and handed back to either the caller (for sync) or the TPL (for async). I wonder if this is simply: not using TPL quite correctly. I would have to see the calling code to comment. – Marc Gravell Dec 01 '14 at 21:53
  • 1
    btw; if you're already in async land; dapper has an async API too ;p – Marc Gravell Dec 01 '14 at 21:55

1 Answers1

15

Here we go:

return Cache.UseCache1().Result

boom; deadlock. But nothing to do with StackExchange.Redis.

At least, from most sync-context providers. This is because your awaits are all implicitly requesting sync-context activation - which can mean "on the UI thread" (winforms, WPF), or "on the currently designated worker thread" (WCF, ASP.NET, MVC, etc). The problem here is that this thread will never be available to process those items, because .Result is a synchronous and blocking call. Thus none of your completion callbacks will get processed, because the only thread that can possibly process them is waiting for them to finish before making itself available.

Note: StackExchange.Redis does not use the sync-context; it explicitly disconnects from sync-context to avoid being the cause of deadlocks (this is normal and recommended for libraries). The key point is that your code doesn't.

Options:

  • don't mix async and .Result / .Wait(), or
  • have all of your await calls (or at least those underneath .UseCache1()) explicitly call .ConfigureAwait(false) - note, however, that this means that the completions will not occur in the calling context!

The first option is the simplest; if you can isolate a tree of calls that do not depend on sync-context then the second approach is viable.

This is a very common problem; I did pretty much the same thing.

Community
  • 1
  • 1
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900