10

I'm experiencing the so-called Timeout performing HGET company:product:settings, inst: 1, queue: 8, qu=0, qs=8, qc=0, wr=0/0, in=79/1 timeout exception.

It's strange because the same Redis instance and in the same machine is storing data, but it's a specific application which throws this exception. Update: In fact, the same application, one line above receives data from Redis. The issue is with HGET.

Also, I've increased timeouts on multiplexer configuration to 6 seconds with no luck.

In addition, I've checked that IDatabase instance has IsConnected with true value.

How to interpret these error messages and what's the issue behind the whole timeout?

Some background...

I've successfully fixed the issue changing when some code section gets a database (i.e. multiplexer.GetDatabase()).

While multiplexer has an instance per AppDomain as described in StackExchange.Redis documentation, many inversion of control components are creating many instances of IDatabase in their own code. That is, IDatabase instance isn't shared.

Actual code is performing a ListRightPopLeftPush, and after that, it's instantiating an inversion of control component which reads a hash key during component instatiation. If instantiate the whole component before doing the so-called ListRightPopLeftPush, then the whole HashGet won't throw the timeout exception.

It seems like even when ListRightPopLeftPush is executed from other IDatabase instance, it produces some kind of issue in the next IDatabase instance when it comes to perform read operations.

Anyway, my fix doesn't answer the question. I've just added more detailed info to let us find what's the issue and its own solution.

Update

Anyway, the above "fix" won't fix further read accesses to Redis. I'm getting the same timeout exception in further calls. And now in paramater found in exception's message says 60/1.

Community
  • 1
  • 1
Matías Fidemraizer
  • 63,804
  • 18
  • 124
  • 206
  • The in=79/1 tells me that there is data available and that a reader is processing it; I will need to check to see if there is any way the reader could be stalled here. – Marc Gravell Aug 29 '14 at 16:24
  • @MarcGravell Hey, and how you want to check that? Anyway, this is code at work, we should check that next monday... – Matías Fidemraizer Aug 29 '14 at 18:32
  • @MarcGravell Nevermind, I understand you're going to check in some test ;) – Matías Fidemraizer Aug 29 '14 at 18:36
  • @MarcGravell Did you get any clue about what's behind the issue? If you need further details you can ask for them and I'll add them to the question – Matías Fidemraizer Sep 01 '14 at 07:19
  • @MarcGravell I've added new details because I could "fix" the issue changing some execution order in my code. Maybe this gives you more info... – Matías Fidemraizer Sep 01 '14 at 07:50
  • `IDatabase` doesn't need to be shared; it is a lightweight pass-thru and does nothing more than hold the database number; are you *sure* your IoC is using the same multiplexer for each? – Marc Gravell Sep 01 '14 at 08:00
  • @MarcGravell Yeah, I'm not sharing `IDatabase` instances. I'm sure about the multiplexer as a single instance because it's instantiated in a static constructor of some factory which isn't an IoC component. – Matías Fidemraizer Sep 01 '14 at 08:05
  • not sharing IDatabase instances won't matter at all, unless you are creating so many of them that you're genuinely impacting GC; a static constructor should indeed result in a shared multiplexer; very odd. – Marc Gravell Sep 01 '14 at 08:10
  • @MarcGravell I've checked that it shares the multiplexer, but this was an expected behavior (because of the static constructor...). No idea of what can be the cause? Ah, I've updated to the latest StackExchange.Redis Nuget package and I'm still experiencing the same issue... argh! – Matías Fidemraizer Sep 01 '14 at 08:54
  • I'd be interested in seeing indicative code for the two (working vs non-working) scenarios; it is very hard to visualize at the moment – Marc Gravell Sep 01 '14 at 08:56
  • @MarcGravell I don't know if I said that the same line will work in other scenarios. I mean, the issue arises in a specific use case. – Matías Fidemraizer Sep 01 '14 at 08:56
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/60377/discussion-between-matias-fidemraizer-and-marc-gravell). – Matías Fidemraizer Sep 01 '14 at 08:58
  • @MarcGravell I've edited question's title. I believe that now it tells more about the issue. – Matías Fidemraizer Sep 01 '14 at 11:47
  • For info: I have an experimental fix - we're dogfooding it currently. It is available on nuget as [1.0.329](https://www.nuget.org/packages/StackExchange.Redis/1.0.329) – Marc Gravell Sep 01 '14 at 12:38
  • @MarcGravell I'm going to try it – Matías Fidemraizer Sep 01 '14 at 13:00
  • @MarcGravell Yeah! It worked for me! Thank you for your effort – Matías Fidemraizer Sep 01 '14 at 13:14
  • let me know how you get on with that – Marc Gravell Sep 01 '14 at 13:15
  • @MarcGravell yes, if I found any new problem around the same issue I'll give you more info, but it seems that the latest nuget that you published solves the issue (at least in my code) – Matías Fidemraizer Sep 01 '14 at 13:27

1 Answers1

7

Based on a long discussion in chat, and a lot of digging, it looks like in some obscure scenarios the TPL is hijacking the dedicated reader thread when we are doing things like .TrySetResult (which: we do often). This causes an instant deadlock if you make a synchronous call, since it can't possibly process any socket data if it is busy waiting for a task to complete (which would only ever be completed by itself). We do actually have code in place specifically to prevent this, but it looks like the workaround actually forces it to happen in some other scenarios. Which... is horrible. I will see what I can find. But basically, the problem is that currently, in some limited scenarios, TaskCompletionSource.TrySetResult is giving power to the TPL to run synchronous continuations. This includes Task.WhenAny.

Community
  • 1
  • 1
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • I don't know if I should mark this as *solved* or we can leave it open until you release a new version of StackExchange.Redis library solving the issue. What's your opinion? – Matías Fidemraizer Sep 01 '14 at 11:42
  • 3
    if you want, it would be useful that your answer would explain that starting from X version the issue got solved! – Matías Fidemraizer Sep 01 '14 at 13:15
  • I've got something like this happening in a Nancy app, and I'm not sure if this is to blame or not. Any updates on *what* the limited Scenarios *are* that can cause this to happen? – Chris Pfohl Dec 01 '14 at 18:01
  • In case it's important this is happening for me in version `1.0.371` framework 4.5, and I'm not able to reproduce it in unit tests, just the Nancy App. – Chris Pfohl Dec 01 '14 at 18:03
  • 1
    Does this mean we should or shouldn't be using async methods? – Wojtek Turowicz Jan 06 '15 at 11:31