0

We've been having this problem with an MVC application for a few weeks now. Every now and then the site will hang. Resource monitor would show the app using a large amount of threads (above 100). A quick peek to the database (SQL Server) revealed that for each of those threads there is also a connection that is doing absolutely nothing (and has never actually run any query at all). We looked through the hanging requests and there are some 100 legit page requests. We've been looking through those requests in search for a some error, but those pages all seem to work normally and the problem seems difficult to recreate. Edit: We didn't look carefully enough, the answer was there. To sum it up:

  • app runs normally
  • something bad happens that causes connections to database to do nothing, but still allows to make them
  • because of this normal requests to the site are never completed and make the app run out of resources (connections in connection pool and worker threads); this doesn't go away until the app is restarted (or better yet app pool recycled, as restarting the app doesn't kill the hanged threads)

Does anyone have an idea what could be causing this behavior? The problem might (or might not) be in our code, but so far it has proven to be extremely difficult to pinpoint. Any clues are most welcome.

jahu
  • 5,427
  • 3
  • 37
  • 64
  • Not much anyone can do without seeing the code. But are you not closing all your db connections? – DavidG Jul 08 '16 at 13:00
  • Are you wrapping your database calls within [`using`](https://msdn.microsoft.com/en-us/library/yh598w02.aspx) statements? This can generally help with issues like this to ensure that your database connections are both disposed of and closed properly. – Rion Williams Jul 08 '16 at 13:01
  • @DavidG We generally close all our connections. When the app runs normally there are no connection leaks. It is only after some unknown event that every connection = leak. The only real clue is that when we checked in the database none of the opened connections (besides the 5 the app normally uses) has actually done anything (http://stackoverflow.com/questions/212596 query shows NULL in last_sql column). – jahu Jul 08 '16 at 13:08

2 Answers2

1

Obviously is difficult to for everyone to come up with an answer since no code has been shown so here are a few generic suggestions:

  1. Abstract your database code, create something like a separate DataLayer if you don't have one already.

  2. Control the connections you open in the code, make sure you don't leave anything open whatever happens, this means always use a using block so that the connection is properly disposed of at the end.

  3. add some logging so you can see exactly when connections open and what they do. Connections don't randomly open, this is caused by something in your code.

  4. Make sure you don't have something like a repository system which returns IQueryable stuff, that's usually a recipe for disaster. Yes Lazy loading data sometimes is great but when it gets out of hand it will bite you.

I wish I could be more helpful but you haven't put enough detail in your question I'm afraid.

Andrei Dragotoniu
  • 6,155
  • 3
  • 18
  • 32
  • We managed to solve the issue. It wasn't any of the obvious causes. We had a side-wide hang, that was caused by a view failing to render which was caused by a custom unobtrusive validator malfunctioning randomly (had the code been run outside a view, we would get an exception at worst, but not a site-wide hang). – jahu Jul 21 '16 at 10:57
1

Hight thread count (as seen is Resource Monitor) and high connection count as seen in SQL Server (see the query below) indicate a site-wide hang.

SELECT 
    DB_NAME(dbid) as DBName, 
    COUNT(dbid) as NumberOfConnections,
    loginame as LoginName
FROM
    sys.sysprocesses
WHERE 
    dbid > 0
GROUP BY 
    dbid, loginame

I expected this to be a database related problem seeing that most of the DB connections were idle (as tested using following query), but that proved to be wrong assumption.

SELECT
    S.spid,
    login_time,
    last_batch,
    status,
    hostname,
    program_name,
    cmd,
    (
        select text from sys.dm_exec_sql_text(S.sql_handle)
    ) AS last_sql
FROM
    sys.sysprocesses S
WHERE
    dbid > 0
    AND DB_NAME(dbid) = 'database_name'
    AND loginame = 'user_name'
ORDER BY last_batch ASC

What I should have checked instead was hanging worker threads (in IIS select root server then Worker Processes and then the app pool for your site). Alternative way to get those is by writing this into the command line:

%windir%\system32\inetsrv\appcmd list requests /elapsed:30000 > hanging-requests.txt

This revealed that the cause of the hang was a single controller action (it was always first or second on the list of hanging requests). We couldn't recreate the problem on development machine and we had trouble getting memory dumps of the site. What we did instead was the most simple and primitive solution. We wrote a simple logger that saves data to txt file. We decorated the malfunctioning action with log entries (like "Action started", "Grabbing model from database" etc.) and waited for something to happen. It turned out the action itself worked fine, but the view never completed. We then started logging stuff inside the view and eventually we found the culprit behind the hang.

The culprit was an overengineered validation attribute. This validator had a client side portion too. Simply put, when Razor was rendering an input for a field decorated with this validator, when something went amiss in the validator it would hang. Most likely it was a database connection that was placed in the constructor (we're not 100% sure). I imagine that every now and then there was a timeout that would throw an exception and hang the request (which would in turn block further requests as well).


Edit: Here is an answer that explains how to do logging of hanging actions even better. I didn't end up using it, but it made me realize what I need to do.

Community
  • 1
  • 1
jahu
  • 5,427
  • 3
  • 37
  • 64