Issue with non responding website. How to debug?

Question

We have a website created in asp-mvc4 running on iss on windows server 2012 and using MSSQL 2012 as data storage. Connections are done using entity framework-6... Very standard stuff.

We are not a high volume website (max 3000 users around the world so hitting it in different timezones)

The issue is that sometimes without warning the site becomes unresponsive (browser does not show it and time out). Nothing special but here is the strange issues:

The server itself is working fine if you terminal server into it
Restarting the ISS does not help er there are no error logs
SQL server have around 100 connections from the website all sleeping (but killing theses processes does not make the site recover)
SQL server at the time show half of them as waiting tasks but it is still responsive if executing sql from SSMS or even remote from excel (remote reporting)
Looking at SQL Profiler website is still sending in SQL request despite being down but they are all request like this: if db_id('dbname') is not null else select... (Not something specifically written in the website)
the really strange one: If we restart the SQL server the website becomes responsive again)

I know this is not a lot to go on but we are very puzzled and don't really know how to proceed. Northing indicate error in any kind of log (website, iss, sql server or windows). I can deduct it must be the website thinking SQL cannot give it what it need because connection pool or something is used up but why it is only freed up with a complete sql server restart and not just killing the processes really puzzles me, and why the connection pool buildup happen in the first place since and sql is handled in entity framework

Any advice on how to debug further is most welcome

Start disabling functionality until the site no longer hangs. — , Mar 15 '15 at 06:01
Dump the w3wp.exe process's memory in task manager then check it out in WinDbg with SOS. It should show you all the requests and what they are doing. You might have 100 requests stuck on AcquireSessionState or something like that. Bad news: this is not an easy process. Or, get something like New Relic and see if it gives you some telemetry about long-running requests. http://justanothersoftwareengineer.blogspot.com.au/2009/10/tracking-down-iis-deadlock-with-windbg.html — ta.speot.is, Mar 15 '15 at 06:02
*SQL server have around 100 connections from the website all sleeping* Note that 100 is a magic number for the ASP.NET Connection Pool. http://stackoverflow.com/a/6197018/242520 It sounds like you're leaking connections, Entity Framework or otherwise. Try explicitly setting a higher number for the connection pool in your connection string. If you set it to 200 and then you have approx. 200 connections when the site dies => you are leaking connections. I believe you can make Entity Framework take a SqlConnection but not dispose it, but otherwise EF is quite reliable at disposing connections. — ta.speot.is, Mar 15 '15 at 06:03
Thanks. Will try with higher pool as test but really does not explain why killing the processes does not make it recover but only make new sleeping processes with the "if db_id('dbname') ... Statement" (only restart of sql itself does) — RWJ, Mar 15 '15 at 06:10
Thinking about it more, perhaps you are leaking connections but perhaps you are also actually using 100 connections. E.g. if you make a new `DbContext` for every request, then every request gets blocked trying to acquire some other resource ... you might have code set up to dispose everything neatly but the requests are just queuing up and not continuing to completion. So no disposal. When you kill vs. restart are you killing 100 connections or just 1? Because if there are a bunch of requests for a connection out of the connection pool queuing up, killing 1 will reduce the count for an instant — ta.speot.is, Mar 15 '15 at 06:24
Killed all 100 sql connections. Is was as the website just ignored it — RWJ, Mar 15 '15 at 06:42
Could you have a look at the deadlock reports on the server? (look at the all blocking transactions report). Does it show some data? — Stephen Reindl, Mar 16 '15 at 22:47
Further information I was able to find during monitoring.. We get these sql/ Windows event errors at the time of the error: SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\dbNa e.mdf] in database [dbNa,e] (5). The OS file handle is 0x0000000000000A7C. The offset of the latest long I/O is: 0x000003e104e000 — RWJ, Mar 18 '15 at 19:55
Then error from ESENT in event viewer: svchost (1740) A request to write to the file "C:\Windows\system32\LogFiles\Sum\Svc.log" at offset 1806336 (0x00000000001b9000) for 4096 (0x00001000) bytes has not completed for 36 second(s). This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem. ... Anyone seen this before or is this as mention truly hardware issue (machine is virtual) — RWJ, Mar 18 '15 at 19:59

Issue with non responding website. How to debug?

0 Answers0