0

We are intermittently getting request hangs for particular users. When this occurs our load balancer will return a 504 after 1 minute. The user will then not get a response back when attempting to refresh. For other users, the site still functions fine.

I have used WinDbg to inspect the threads and some are suspended appearing to be waiting for Redis calls, so I assume there is some sort of dead lock situation.

example threads:

0:052> !clrstack
OS Thread Id: 0x584 (52)
        Child SP               IP Call Site
00000015d5b0db38 00007ffef91206fa [InlinedCallFrame: 00000015d5b0db38] StackExchange.Redis.SocketManager.select(Int32, IntPtr[], IntPtr[], IntPtr[], TimeValue ByRef)
00000015d5b0db38 00007ffe962b2b51 [InlinedCallFrame: 00000015d5b0db38] StackExchange.Redis.SocketManager.select(Int32, IntPtr[], IntPtr[], IntPtr[], TimeValue ByRef)
00000015d5b0db00 00007ffe962b2b51 DomainBoundILStubClass.IL_STUB_PInvoke(Int32, IntPtr[], IntPtr[], IntPtr[], TimeValue ByRef)
00000015d5b0dbe0 00007ffe962b20f1 StackExchange.Redis.SocketManager.ReadImpl()
00000015d5b0ddc0 00007ffe962b1496 StackExchange.Redis.SocketManager.Read()
00000015d5b0de00 00007ffeed45ca72 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
00000015d5b0ded0 00007ffeed45c904 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
00000015d5b0df00 00007ffeed45c8c2 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
00000015d5b0df50 00007ffeede3100c System.Threading.ThreadHelper.ThreadStart(System.Object)
00000015d5b0e1a8 00007ffeeeb66793 [GCFrame: 00000015d5b0e1a8] 
00000015d5b0e4f8 00007ffeeeb66793 [DebuggerU2MCatchHandlerFrame: 00000015d5b0e4f8] 
00000015d5b0e688 00007ffeeeb66793 [ContextTransitionFrame: 00000015d5b0e688] 
00000015d5b0e8b8 00007ffeeeb66793 [DebuggerU2MCatchHandlerFrame: 00000015d5b0e8b8] 

And:

0:050> !clrstack
OS Thread Id: 0x2784 (50)
        Child SP               IP Call Site
00000015d51ae118 00007ffef9120c6a [GCFrame: 00000015d51ae118] 
00000015d51ae1e8 00007ffef9120c6a [HelperMethodFrame_1OBJ: 00000015d51ae1e8] System.Threading.Monitor.ObjWait(Boolean, Int32, System.Object)
00000015d51ae300 00007ffe962a15cc StackExchange.Redis.SocketManager.WriteAllQueues()
00000015d51ae380 00007ffe962a134f StackExchange.Redis.SocketManager.b__1e(System.Object)
00000015d51ae3c0 00007ffeed45ca72 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
00000015d51ae490 00007ffeed45c904 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
00000015d51ae4c0 00007ffeed45c8c2 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
00000015d51ae510 00007ffeede3100c System.Threading.ThreadHelper.ThreadStart(System.Object)
00000015d51ae768 00007ffeeeb66793 [GCFrame: 00000015d51ae768] 
00000015d51aeab8 00007ffeeeb66793 [DebuggerU2MCatchHandlerFrame: 00000015d51aeab8] 
00000015d51aec48 00007ffeeeb66793 [ContextTransitionFrame: 00000015d51aec48] 
00000015d51aee78 00007ffeeeb66793 [DebuggerU2MCatchHandlerFrame: 00000015d51aee78] 

This question seems similar, however I am not calling the StackExchange.Redis library directly. Instead it is configured as an out of the box Sitecore session provider. So our interaction with it is via the standard ASP.NET API:

HttpContext.Current.Session["object"] = object;

What is likely to be the cause of this issue and/or what other steps could I take to identify the cause?

David Masters
  • 8,069
  • 2
  • 44
  • 75
  • Well, between 1.x and 2.x of SE.Redis, we *completely rewrote* most of the network related code to avoid a number of pain points - it is *possible* that this is related to one of the things we were changing, but: if you're using tools that *use* SE.Redis (I'm not familiar with sitecore), *they'd* need to update to 2.x too – Marc Gravell Feb 04 '19 at 16:58
  • Redis bugs have been flaring up lately. Apparently it's a Sitecore bug. Part of the solution is to increase the "retryTimeoutInMilliseconds". – Marcel Gruber Feb 04 '19 at 23:04
  • @marc gravell is there anything you can think of that we could be doing in code that would inadvertently cause issues with SE.Redis? – David Masters Feb 05 '19 at 07:19
  • @marcel do you have any reference for the solution? – David Masters Feb 05 '19 at 07:19
  • Try these guys: https://kb.sitecore.net/articles/175524 | https://kb.sitecore.net/articles/464570 | – Marcel Gruber Feb 05 '19 at 15:57

0 Answers0