6

Pardon me if this is a duplicate. The closest I could find was Random timeout running a stored proc - drop recreate fixes but I'm not certain the answers there about recompiling the stored procedure apply.

I have an Azure SQL database, latest version, that has a lot of traffic from an Azure web app front end. I have a nightly remote job that runs a batch to rebuild indexes on the Azure SQL database as that seems to help greatly with controlling database size and performance.

Normally, the rebuilding of indexes takes about 20 minutes. Last night it timed out after 2 hours. The error handler in that batch did not log any errors.

Soon after rebuilding indexes was started, one particular stored procedure starting timing out for every client calling it. Other stored procedures using the same tables were not having any issues. When I discovered the problem, I could alleviate all the timeouts and suspended processes by altering the stored procedure to immediately return. When I altered the stored procedure again to behave normally, the issues reappeared immediately. My understanding is that altering the stored procedure forced it to recompile but that didn't fix it.

Ultimately, I completely dropped and recreated the procedure with the original code and the issue was resolved.

This procedure and the schema it uses have been completely stable for many months. The procedure itself is quite simple:

CREATE Procedure [dbo].[uspActivityGet] (@databaseid uniqueidentifier) AS
begin
    SET NOCOUNT ON;
    --There may be writing activities to the table asynchronously, do not use nolock on tblActivity - the ActivityBlob might be null in a dirty read.
    select top 100 a.Id, h.HandsetNumber, a.ActivityBlob, a.ActivityReceived
    from dbo.tblDatabases d with(nolock) join dbo.tblHandsets h with(nolock) on d.DatabaseId = h.DatabaseId join dbo.tblActivity a on h.Id = a.HandsetId
    where d.DatabaseId = @databaseid and a.ActivitySent is null
    order by a.ActivityReceived
end

While the procedure would hang and time out with something like this:

exec dbo.uspActivityGet 'AF3EA01B-DB22-4A39-9E1C-D096D2DF1215'

Running the identical select in a query window would return promptly and successfully:

declare @databaseid uniqueidentifier; set @databaseid = 'AF3EA01B-DB22-4A39-9E1C-D096D2DF1215'
select top 100 a.Id, h.HandsetNumber, a.ActivityBlob, a.ActivityReceived
from dbo.tblDatabases d with(nolock) join dbo.tblHandsets h with(nolock) on d.DatabaseId = h.DatabaseId join dbo.tblActivity a on h.Id = a.HandsetId
where d.DatabaseId = @databaseid and a.ActivitySent is null
order by a.ActivityReceived

Any ideas how I can prevent this from happening in the future? Thank you.

Edit - Adding execution plan screenshot Execution Plan of stored procedure above

Edit - Adding query used to view running processes. There were many, guessing aproximately 150, in the suspended state and they were all for the same stored procedure - uspActivityGet. Also, Data IO Percentage was maxed out the whole time when it normally runs 20 - 40% in peak demand times. I don't recall what the wait type was. Here is the query used to view that.

select * from sys.dm_Exec_requests r with(nolock) CROSS APPLY sys.dm_exec_sql_text(r.sql_handle)  order by r.total_elapsed_time desc

Edit - It happened again tonight. Here is the execution plan of the same procedure during the issue. After dropping and creating the procedure again, the execution plan returned to normal and the issue was resolved.

During the issue, sp_executesql with the identical query took about 5 minutes to execute and I believe that is representative of what was happening. There were about 50 instances of uspActivityGet suspended with wait type SLEEP_TASK or IO_QUEUE_LIMIT.

Perhaps the next question is why is index rebuilding or other nightly maintenance doing this to the execution plan?

enter image description here

Community
  • 1
  • 1
RJBreneman
  • 787
  • 7
  • 21
  • other procedures that does not have issues use the very same joins? maybe different procedures leverage different indexes? does `d.DatabaseId` is indexed? does the `NOLOCK` hints are required? maybe indexing issues are hidden by that hint? what happen if you run the query from SSMS when the procedure is failing? – Paolo Jan 07 '16 at 16:12
  • @Paolo running the identical select statement in a query window would return promptly and successfully. – RJBreneman Jan 07 '16 at 18:46
  • Are you able to tell if the proc is blocked or doing something for a very long time or something else? When you provide the GUID as a constant in the query, it changes how SQL Server costs the query so the test isn't really running both queries in the same way. Try testing with sp_executesql N'--your query', N'@databaseid uniqueidentifier', '--your param' and see if it still runs fast or gets bogged down. Also, share your execution plans so we can see what SQL Server was thinking – SQLmojoe Jan 07 '16 at 22:47
  • @SQLmojoe Since I dropped and recreated the procedure, it is now behaving well regardless of how I test it. I've added the current execution plan and what was observed about the suspended processes during the issue. Are there some better details I should look for if I see it happening again? Thanks. – RJBreneman Jan 08 '16 at 02:02
  • Have you tried Paolo's suggestion and removed the `NOLOCK` hint from the query? Also, maybe you could try updating the statistics as well before rebuilding the indexes? Another try: add `WITH RECOMPILE` to the stored procedure? Maybe something will alter the behaviour you are seeing and will lead you to a more robust solution? GL. – TT. Jan 08 '16 at 10:03
  • @Paolo "maybe indexing issues are hidden by that hint" may be more relevant than I realized as this issue seems precipitated by index rebuilding. I will remove NOLOCK. – RJBreneman Jan 08 '16 at 10:40
  • @TT. I will investigate updating statistics before rebuilding indexes but I have to be cautious of the effect of doing so. I ran DBCC CHECKTABLE on tblActivity earlier tonight and it ran for 4.5 hours. – RJBreneman Jan 08 '16 at 10:44
  • @RJBreneman Sure I'm not saying it will help. I gave 3 hints, I would first try removing NOLOCK, then WITH RECOMPILE and if those don't help try updating statistics. – TT. Jan 08 '16 at 10:49
  • Is WITH RECOMPILE a possible performance hit? Using dm_exec_query_stats I get the following on this procedure: Total CPU Time - 5144744, Execution Count - 619, Avg CPU Time - 8311. Actually, Execution Count jumped to 1925 in just a few minutes. It's a busy database. – RJBreneman Jan 08 '16 at 11:23
  • It is a performance hit yes. But IMO negligable. How many times is the SP called per minute? And if it is ~1/minute do a couple of extra ms matter all that much? More importantly I suggested it to see if your problem still occurs with the WITH RECOMPILE hint. I'm thinking your SP gets a bogus execution plan while the indices are being rebuilt and maybe the RECOMPILE produces a better one at that time? – TT. Jan 08 '16 at 11:38
  • It's getting called ~100 times per minute now, peak demand times will increase size of binary blobs returned substantially too. There is also the earlier observation that forcing a recompile by altering the procedure did not fix the issue, it was only fixed by drop create. – RJBreneman Jan 08 '16 at 11:49
  • Alright MS Azure, somebody please tell me what is going on here. I remove the NOLOCK hints and immediately the number of suspended calls to the procedure starts jumping by the dozens per minute. I put the NOLOCK back and they all clear up. I repeated this cause - effect multiple times. – RJBreneman Jan 08 '16 at 12:12
  • Report it to MS. You bought their product and you probably have a support contract going, yes? – TT. Jan 08 '16 at 12:40

1 Answers1

1

The clues are in the query and the troublesome execution plan. See Poor Performance with Parallelism and Top

The normal execution plan seems quite efficient and shouldn't need recompiled as long as the relevant schema doesn't change. I also want to avoid parallelism in this query. I added the following two options to the query for assurance on both points and all is happy again.

OPTION (KEEPFIXED PLAN, MAXDOP 1)
RJBreneman
  • 787
  • 7
  • 21