How can I improve the performance of this stored procedure

Question

So I just started working with SQL Server for about two months now (I am still a newbie), and I have to improve the performance of a stored procedure. It runs in 3sec but for somewhat reason the clients are not satisfied with the results. I have tried to train myself into reading the execution plans and figure out what are the problems.

After getting the SQL Sentry Plan Explorer, I figured out that only one part of the procedure is causing the problem, and that part is this:

With myAccount as
(
    select 
        ROW_NUMBER() over(order by Account) as Row_ID,
        ID, Account, 
        replace(Name, '*', '#_') Name, 
        Totaling 
    from 
        Account
    where 
        Company_ID = @company_id and Balance = 0)
,myR1C1 (ID, R1C1) as
(
    select 
        t1.ID, 
        case when t4.Account = t6.Account 
             then 'R[' + convert(nvarchar(10), t4.row_id - t1.Row_ID) + ']C'
             else 'R[' + convert(nvarchar(10), t4.Row_ID - t1.Row_ID) + ']C:R[' + convert(nvarchar(10), t6.Row_ID - t1.Row_ID) + ']C'  
        end R1C1 
        --t1.*,t2.*,t4.*,t6.*, t4.id-t1.id,t6.id-t1.id 
    from 
        myAccount t1
    cross apply 
        dbo.abx_sysSplitTwo(Totaling,'|') t2
    cross apply 
        (select top 1 
             Row_ID, ID, account 
         from myAccount t3 
         where t3.account >= t2.VFr 
           and t3.account <= t2.vto 
           and t1.account <> t3.account 
         order by t3.account) t4
    cross apply 
        (select top 1 
             Row_ID, ID, account  
         from myAccount t5 
         where t5.account >= t2.VFr 
           and t5.account <= t2.vto 
           and t1.account <> t5.account order by t5.account) t6
)
, myAccount2 as
(
    Select 
        t1.*, t2.R1C1 
    from myAccount t1 
    left join 
       (select 
            ID, STUFF((select',' + R1C1 
                       from myR1C1
                       where ID = a.id 
                       for xml path ('')), 1, 1, '') as R1C1
        from 
            myR1C1 as a
        group by 
            id) t2 on t1.ID = t2.id
--  order by row_ID

-- Data1
select tv.id [<dang n="BudData" u="0" o="1" fmt="1" fn="ID"/>],tv.account, tv.Name
 ,case when len(tv.R1C1)>0 then '=subtotal(9,' + tv.r1c1 + ')' else convert(nvarchar,sum(case When tp.[Date] between dateadd(yy,-1,@FromDate) and dateadd(d,-1,@FromDate) then isnull(Bud,0) else 0 end)) end A
 ,case when len(tv.R1C1)>0 then '=subtotal(9,' + tv.r1c1 + ')' else convert(nvarchar,Sum(case When tp.[Date] between dateadd(yy,-1,@FromDate) and dateadd(d,-1,@FromDate) then isnull(tp.Actual,0) else 0 end)) end B
 ,case when len(tv.R1C1)>0 then '=subtotal(9,' + tv.r1c1 + ')' else convert(nvarchar,sum(case When tp.[Date] between dateadd(yy,0,@FromDate) and dateadd(d,-1,dateadd(yy,1,@FromDate)) then isnull(Bud,0) else 0 end)) end C
 ,case when len(tv.R1C1)>0 then '=subtotal(9,' + tv.r1c1 + ')' else convert(nvarchar,sum(case When tp.[Date] between dateadd(yy,0,@FromDate) and dateadd(yy,0,@YTD) then isnull(Bud,0) else 0 end)) end D
 ,case when len(tv.R1C1)>0 then '=subtotal(9,' + tv.r1c1 + ')' else convert(nvarchar,Sum(case When tp.[Date] between dateadd(yy,0,@FromDate) and dateadd(yy,0,@YTD) then isnull(tp.Actual,0) else 0 end)) end E 
 ,case when len(tv.R1C1)>0 then '=subtotal(9,' + tv.r1c1 + ')' else convert(nvarchar,Sum(case When tp.[Date] between dateadd(yy,0,@FromDate) and dateadd(yy,0,@YTD) then isnull(tp.Actual,0) else 0 end)-sum(case When tp.[Date] between dateadd(yy,0,@FromDate) and dateadd(yy,0,@YTD) then isnull(tp.Bud,0) else 0 end)) end F
 ,case when len(tv.R1C1)>0 then '' else convert(nvarchar,case when sum(case When tp.[Date] between dateadd(yy,0,@FromDate) and dateadd(yy,0,@YTD) then isnull(tp.Bud,0) else 0 end) between -1 and 1 then 0 else (Sum(case When tp [Date] between dateadd(yy,0,@FromDate) and dateadd(yy,0,@YTD) then isnull(tp.Actual,0) else 0 end)/sum(case When tp.[Date] between dateadd(yy,0,@FromDate) and dateadd(yy,0,@YTD) then isnull(tp.Bud,0) else 0 end)*100) end) end G
 ,case when len(tv.R1C1)>0 then '=subtotal(9,' + tv.r1c1 + ')' else convert(nvarchar,Sum(case When tp.[Date] between dateadd(yy,0,@FromDate) and dateadd(yy,0,@YTD) then isnull(Rev,0)  -- Rev er her rettet fra REv til Faktisk else 0 end +case When tp.[Date] between dateadd(d,1,@YTD) and dateadd(d,-1,dateadd(yy,1,@FromDate)) then isnull(tp.Rev,0) else 0 end)) end H

I know it looks huge and maybe it is stupid of me to show it this way, but after 2 weeks of tries and not so much success people started to push me and complain that I am using to much time on this... and I honestly dont really know what to do more.

Up until now I have used the SQL Profiler tool in order to take a file with the workload and use it in the Tunning Advisor tool to see what recommandations it makes. And I got some recomandations which said to build some statistics and some indexes, which I did, but the difference was almost unnoticeable. Another thing which i think is a good idea to mention is that when calculating(before using the Profiler and Tunning Advisor) the estimated number of rows which this part of the stored procedure should return is 1600, while the actual number of rows is 536. As far as I've read that is not a good thing. Now the wierd part comes, that after using the advice got from the Tunning Advisor, instead of decresing the estimated nr of row, it got increased to 2800, but the speed was pretty much the same 2-3sec. I know that there are a lot of more things which can be done, but i dont not posses either the knowledge or the time to dig in into them at the moment, so if anyone could point me in the right direction that would be great. If there is anything else I could provide in order to get to the bottom of this, I will glady do so, so please just ask. And I almost forgot, the expected result would be probably 1sec or less.. due to the fact that it is only 536 rows, and my clients have seen queries which perform much faster on over 20.000 rows

score 0 · Answer 1 · edited May 23 '17 at 12:03

0

First I invite you read http://importblogkit.com/2015/05/how-do-you-eat-an-elephant/

You need to split your query into smaller pieces to determinate where is the problem. You already find out what part of your query is the slower part, now you should keep digging.

You have 4 querys myAccount, myR1C1, myAccount2 and final select. Just run one by one and ANALYZE/EXPLAIN each one to see what does, how long take and what index use.

Check number of rows, try to add new indexs or composite index to improve speed.

Now if you found something you thing looks weird read How to create a Minimal, Complete, and Verifiable example. and create a new question just with that part.

edited May 23 '17 at 12:03

Community

1
1

answered Oct 28 '15 at 17:46

Juan Carlos Oropeza

47,252
12
78
118

Although I applaud the link to the /elephant-blog/, I think the `ANALYZE/EXPLAIN` won't help much here as (s)he's on MSSQL (judging by the tags and code). Also, SQL Sentry Plan Explorer is pretty much a graphical EXPLAIN and does a splendid job at visualising what goes on inside the query processor. – deroby Oct 29 '15 at 09:32
Thank you for the information. I will try to put it to good use:) – Mihai Enea Oct 30 '15 at 09:28

score 0 · Answer 2 · answered Oct 29 '15 at 09:25

Since you already have this running inside SQL Sentry Plan Explorer, could you try to run just this query and then copy the Plan XML data ? (you could also safe the .queryanalysis file, but it may contain info about your server-name etc which you probably rather not put online). Simply copy-paste it to a pastebin or zip it on e.g. dropbox and share the URL here. That way we'll have a better look at what is really happening...

On first sight it looks like you're doing a lot of ORDER BY's which might be 'heavy on the machine', but I mostly fear for the dbo.abx_sysSplitTwo() function. For starters, functions are really bad for performance and they'll certainly mess up the Query Optimizers guess-work. Additionally, if the function is written in an 'non-optimal way', this might slow it down even more.

=> Am I right to assume that it splits something like 'ABC|XYZ' into vfr='ABC' and vto='XYZ'? Or is it supposed to return multiple records (e.g when it finds 'ABC|XYZ|PQR|STU' it returns 2 records?) or something like that?

Finally, I think you're missing part of the query.

Anyway, the trouble with Common Table Expressions is that one can get easily carried away with building query upon query making it impossible to figure out what exactly might be wrong.

I'd recommend to split the first part into separate temp-tables and then use e.g. Query Plan Explorer to see what is taking so long. (Look at duration, the 'cost' field might give an indication on what parts might not scale very well, but in the end the duration is what you care about now!). The adding of indexes might seem excessive, but they have the benefit of implicitly adding (very good) statistics to the tables, and in this case they often help out the queries that follow. And besides, indexing 'reasonably sized tables' takes VERY little time on modern hardware.

SELECT ROW_NUMBER() OVER (order by Account) as Row_ID,
       ID, Account, Name
       Totaling
  INTO #myAccount
  FROM Account
 WHERE Company_ID = @company_id 
   AND Balance = 0

CREATE UNIQUE CLUSTERED INDEX uq0 ON #myAccount (Row_ID) WITH (FILLFACTOR = 100)
CREATE                  INDEX idx1 ON #myAccount (Account) WITH (FILLFACTOR = 100) -- used later on for t4 and t6

-- split the Totaling, I'm assuming multiple records can be returned by abx_sysSplitTwo
SELECT DISTINCT Totaling
  INTO #pre_split
  FROM #myAccount 

SELECT Totaling, vfr, vto
  INTO #split
  FROM #pre_split 
  CROSS APPLY dbo.abx_sysSplitTwo(Totaling,'|') t2

CREATE CLUSTERED INDEX idx0 ON #split (Totaling) WITH (FILLFACTOR = 100)

-- apply splitted values
 select t1.ID, 
        case when t4.Account = t6.Account 
             then 'R[' + convert(nvarchar(10), t4.Row_ID - t1.Row_ID) + ']C'
             else 'R[' + convert(nvarchar(10), t4.Row_ID - t1.Row_ID) + ']C:R[' + convert(nvarchar(10), t6.Row_ID - t1.Row_ID) + ']C'  
        end R1C1 
  INTO #myR1C1        
  FROM #myAccount t1
  JOIN #split t2
    ON t2.Totaling = t1.Totaling
  CROSS APPLY 
        (SELECT TOP 1 Row_ID, ID, Account 
          FROM #myAccount t3 
         WHERE t3.Account >= t2.VFr 
           AND t3.Account <= t2.vto 
           AND t3.Account <> t1.Account 
         ORDER BY t3.Account) t4
    CROSS APPLY 
        (SELECT TOP 1 Row_ID, ID, Account  
          FROM #myAccount t5 
         WHERE t5.Account >= t2.VFr 
           AND t5.Account <= t2.vto 
           AND t5.Account <> t1.Account
         ORDER BY t5.Account) t6

CREATE CLUSTERED INDEX idx0 ON #myR1C1 (ID) WITH (FILLFACTOR = 100)

-- final result (concatenate R1C1 fields)
SELECT Row_ID, ID, Account, 
       REPLACE(Name, '*', '#_') Name, 
       R1C1
  FROM #myAccount
  LEFT OUTER JOIN  (SELECT ID, 
                           STUFF((SELECT ',' + t.R1C1 
                                    FROM #myR1C1 t
                                   WHERE t.ID = a.ID 
                                     FOR XML PATH ('')), 1, 1, '') as R1C1
                     FROM #myR1C1 as a
                    GROUP BY ID) t2 
               ON t1.ID = t2.ID

PS: Yes I know, the better approach would be to do SELECT .. INTO #table FROM... WHERE 1 = 2 first and then fill it up using an INSERT INTO, but for this test I'm being lazy...

Thank you for the guidance, unfortunately i will not be able to try it out, or provide any other information at the moment due to some other tasks that i was given. I shall come back with an answer next monday — Mihai Enea, Oct 30 '15 at 09:28

How can I improve the performance of this stored procedure

2 Answers2