4

I have an interesting problem going on. I recently moved 2 SQL databases to SQL Azure for a client and all seemed to be going well...at first. Mid-morning I get a spike of error emails for various things, but a few common ones:

-The request limit for the database is 90 and has been reached.
-Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
-A transport-level error has occurred when receiving results from the server.

There's obviously some database related issues going on with the move to Azure, or the existing code in general. The errors that seemed to happen the most were request limit and timeouts. Once they started, they never seemed to stop. And I don't think there were many users using the site today. It almost seemed like the connection continued to try to connect on a different thread in the background if this makes any sense. This is in reference to the "The timeout period elapsed prior to completion of the operation or the server is not responding." I would get an error email, I'd check the page it referenced myself, and it would load immediately. I checked with the user who threw the error and they reported everything was fine. Strange. Yet I continued every few minutes to get the same error email.

I currently have them on the S1 Tier which limits the requests to 90 concurrently. I did some digging and found the following SQL query:

select * from sys.dm_exec_connections

I ran this, and it showed I had over 90 active connections, some of which were opened some time ago. This was strange to me as the site was currently not being used (it's really late at night and I know no one is using the site). I wanted to end all the connections so I came up with the following query:

DECLARE @sessionId int
DECLARE @SQL nvarchar(1000)
DECLARE @clientIP nvarchar(50)

set @clientIP = 'XX.XX.XX.XX'

select @sessionId = min( session_id ) from sys.dm_exec_connections where   client_net_address = @clientIP

while @sessionId is not null
begin
    SET @SQL = 'KILL ' + CAST(@sessionId as varchar(4))
    EXEC (@SQL)
    select @sessionId = min( session_id ) from sys.dm_exec_connections where session_id > @sessionId and client_net_address = @clientIP
end

I tried running this command, but the connections came right back. I went on the web server and manually stopped the site in IIS, ran the KILL command again but the connections remained. I put up the app_offline file and took the site down for about a half hour to see if any lingering connections would drop, but they didn't. And I still continued to get error emails for pages I KNEW were not accessible because I stopped the Site AND app pool. I went on the server and manually stopped the w3wp process and ran SQL KILL statements to kill the connections. They finally went away! I put the app back online and hit a single page. I kept running the above query to see the active connections and sure enough every time I ran the query the active connection count kept creeping up. It stops around 102 as of right now. And that's me as a user hitting a single page. I'm guessing this isn't normal? Does this indicate connections are lingering out there and not being dropped or closed?

I just made code changes recently adding Entity Framework. Wherever I'm grabbing data through EF, im using so with a using statement on the context. The rest of the app is sort of old and is using TableAdapters. I see in some places it's following the same pattern with using statements, other places Dispose is being called. I haven't had a chance to track down all the usages yet. Is this a good place to start looking? Anyone have any suggestions on how to track this 'leak' down? I'm not super knowledgeable with SQL so any help would be greatly appreciated!

rickers
  • 59
  • 5
  • I appreciate the time you've taken to write this question, but is there any chance you could make it shorter? Maybe one quarter the size? – Enigmativity Sep 29 '15 at 08:33
  • 1
    Fwiw, I think you have provided an excellent level of detail :) I am struggling to see what would cause this, short of something that is auto retrying. Maybe a statement that now takes too long to run on Azure and is automatically retried? You can enable Auditing in Azure to see all the statements and execution time. It may not be the thing you are getting an error about that is the problem. – flytzen Sep 29 '15 at 09:42
  • 1
    I did implement the SqlAzureExecutionStrategy which auto retries a certain number of times before giving up on a connection. Maybe this has gone rogue. – rickers Sep 29 '15 at 14:51
  • @Enigmativity - sorry for providing too much detail. I figured someone more knowledgeable than me would be able to spot a common problem if I accurately explained what was going on. – rickers Sep 29 '15 at 14:54
  • 1
    Eric, if I were you, I'd enable SQL Auditing on your database. It should be able to show you any big or repeating SQL statements. Alternatively have a look at the dmvs, though I suspect the Auditing will be easier to interpret. – flytzen Sep 29 '15 at 20:49

0 Answers0