0

So, I have an SQL query for MSSQL looking like this (simplified for readability):

SELECT ...
FROM (
    SELECT ..., ROUND(SUM(TOTAL_TIME)/86400.0,2) ... 
    FROM MY_DATA
    WHERE STATUS NOT IN (107)
    GROUP BY ...
) q
WHERE q.Tdays > 0
GROUP BY ...   

It works fine, but I need a comparison against another table in the inner query, so I added a left join and said comparison:

SELECT ...
FROM (
    SELECT ..., ROUND(SUM(TOTAL_TIME)/86400.0,2) ... 
    FROM MY_DATA
    LEFT JOIN OTHER_TABLE ON MY_DATA.ID=OTHER_TABLE.ID //new JOIN
    WHERE STATUS NOT IN (107) AND (DEPARTMENT_ID='SP' OR DEPARTMENT_ID='BL') //new AND branch
    GROUP BY ...
) q
WHERE q.Tdays > 0
GROUP BY ...  

This query works, but is A LOT slower thant the previous one. The wierd thing is, commenting out the new AND-branch of the WHERE clause while leaving the JOIN as it is makes it faster again. As if it's not joining another table that is slowing the query down, but the actual string comparisons... I am lost as to why this is so slow, or how I could speed it up... any advice would be appreciated!

Alex
  • 1,157
  • 3
  • 11
  • 25
  • SQL is notoriously slow at comparing strings - especially if they're not indexed. I don't think there is a way around this, unfortunately. In your example, perhaps you could create a department table with a numeric value (1 - 'SP'...2 - 'BL'...etc), join to that, and compare based on the number. That would probably be a lot faster, - this is why I always use an integer field for a PK, even when the text is unique (like your department IDs). – Stan Shaw Jun 27 '17 at 14:57
  • Creating an Index on DEPARTMENT_ID would help you. – Jigar Jun 27 '17 at 15:03
  • Is `ID` a primary key on `OTHER_TABLE`? – Aurélien Gasser Jun 27 '17 at 15:15
  • Yes, it is on both. – Alex Jun 27 '17 at 15:17
  • 1
    You ought to look at / show the explain plan. https://stackoverflow.com/questions/7359702/how-do-i-obtain-a-query-execution-plan – Philip Couling Jun 27 '17 at 15:22
  • I'll have to look into execution plans, thanks for the info. – Alex Jun 27 '17 at 15:25
  • Do you have a combined index on both STATUS and DEPARTMENT_ID? – MKR Jun 27 '17 at 16:50
  • It's not clear which columns are coming from where but if you are not actually _selecting_ any columns from the table, only filtering on them, you can sometimes improve performance by using `EXISTS`.. sometimes – Nick.Mc Jun 28 '17 at 05:48

3 Answers3

0

Use an INNER JOIN. The outer join is being undone by the WHERE clause anyway:

SELECT ..., ROUND(SUM(TOTAL_TIME)/86400.0,2) ... 
FROM MY_DATA d INNER JOIN
     OTHER_TABLE ot
     ON d.ID = ot.ID //new JOIN
WHERE od.STATUS NOT IN (107) AND DEPARTMENT_ID IN ('SP', 'BL') //new AND branch
GROUP BY ...

(The IN shouldn't make a difference; it is just easier to write.)

Next, if this still has slow performance, then look at the execution plans. It means that SQL Server is making a poor decision, probably on the JOIN algorithm. Normally, I fix this by forbidding nested loop joins, but there might be other solutions as well.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Your code example still contains a left join - I guess it's just an oversight/typo. Replacing LEFT JOIN with INNER JOIN does not noticeably speed things up... I'll have to look into execution plans - thanks for the info. – Alex Jun 27 '17 at 15:23
0

It's hard to say definitively what will or won't speed things up without seeing the execution plan. Also, understanding how fast you need it to be affects what steps you might want to (or not want to) consider taking.

What follows is admittedly somewhat vague, but these are a few things that came to mind when I thought about this. Take a look at the execution plan as Philip Couling suggested in that good link to get an idea where the pain points are, and of course, take these suggestions with a grain of salt.

  • You might consider adding some indexes to either or both of the tables. The execution plan might even give you suggestions on what could be useful, but off the top of my head, something on OTHER_TABLE.DEPARTMENT_ID probably wouldn't hurt.
  • You might be able to build potential new indexes as Filtered Indexes if you know those hard-coded search terms (like STATUS and DEPARTMENT_ID are always going to be the same).
  • You could pre-calculate some of this information if it's not changing so rapidly that you need to query it fresh on every call. This comes back to how fast you need it to go, because for just about any query, you can add columns or pre-populated lookup tables to avoid doing work at run time. For example, you could make an new bit field like IsNewOrBranch or IsStatusNot107 (both somewhat egregious steps, but things which could work). Or that might be pre-aggregating the data in the inner query ahead of time.
  • I know you simplified the query for our benefit, but that also makes it a little hard to know what's going on with the subquery, and the subsequent GROUP BY being performed against that subquery. There might be a way to avoid having to do two group bys.
  • Along the same vein, you might also look into splitting what you're doing into separate statements if SQL is having a difficult time figuring out how best to return the data. For example, you might populate a temp table or table variable with the results of your inner query, then perform your subsequent GROUP BY on that. While this approach isn't always useful, there are many times where trying to cram all the work into a single query will actually end up being worse than several individual, simple, optimized steps would be.
  • And as Gordon Linoff suggested, there are a number of query hints which could be used to coax the execution plan into doing things a specific way. But be careful, often that way lies madness.
Xedni
  • 3,662
  • 2
  • 16
  • 27
0

Your SQL is fine, and restricting your data with an additional AND clause should usually not make it slower.

As it happens, choosing a fast execution path is a hard problem, and SQL Server sometimes (albeit seldom) gets it wrong.

What you can do to help SQL Server find the best execution path is to:

  • make sure the statistics on your tables are up-to-date and
  • make sure that there is an "obviously suitable" index that SQL Server can use. SQL Server Management studio will usually give you suggestions on missing indexes when selecting the "show actual execution plan" option.
Heinzi
  • 167,459
  • 57
  • 363
  • 519