How Can I speed up my SQL Query

Question

I'm currently running this query:

SELECT  Time_ID,
    Site_Type_ID,
    Abandoned_ID,
    WorkType_ID,
    SUM (staging.dbo.incoming_measure.ring_time) AS Ring_Time,
    SUM (staging.dbo.incoming_measure.hold_time) As Hold_Time,
    SUM (staging.dbo.incoming_measure.talk_time) AS Talk_Time,
    SUM (staging.dbo.incoming_measure.acw_time) AS ACW_Time,
    COUNT(*) CallCount
FROM incoming_measure
INNER JOIN DataMartEnd.dbo.Time_Dim 
        ON incoming_measure.StartTimeDate BETWEEN Time_Dim.Time_Start and 
                                                  Time_Dim.Time_End
INNER JOIN datamartend.dbo.Site_Type_Dim 
        ON incoming_measure.DBID = Site_Type_Dim.Site_Type_Code
INNER JOIN datamartend.dbo.Abandoned_Call_Dim 
        ON incoming_measure.Abandoned = Abandoned_Call_Dim.abandoned_value
INNER JOIN DataMartEnd.dbo.Work_Type_Dim 
        ON incoming_measure.DBID = work_type_dim.MIG_Site_ID AND
           Work_Type_Dim.Work_Type_Code = incoming_measure.Queue AND
           incoming_measure.StartTimeDate BETWEEN Work_Type_Dim.DimEffectiveStartDtm AND
                                                  Work_Type_Dim.DimEffectiveEndDtm
group by time_id, Site_Type_ID, Abandoned_ID, WorkType_ID

It is returning the correct results but is taking around 8minutes to run and I just wondered if anyone had any advice on how i can speed up the query? The main issue if this is part of a project in which i have to demonstrate the end result to a client and I am only allowed 10 minutes to demonstrate (University Rules) and this query is about 30% of the project.

The bulk of the estimated execution is the SORT which is 57%.

your query is looking fine, create a stored procedure to run it everytime — donstack, Apr 01 '13 at 12:31
any indexes present on the columns? Is 'staging' and 'datamartend' on the same server? — rene, Apr 01 '13 at 12:34
It's really hard to tell anything without knowing your exact table structure, including indexes as well as data distribution. — Nenad Zivkovic, Apr 01 '13 at 12:36
Yeah ive index's on pretty much all of the columns used, would this make it quicker or slow it down? — Richard C, Apr 01 '13 at 12:39
Please either supply the **actual** execution plan or at least answer [my question](http://stackoverflow.com/questions/15741345/estimated-execution-plan-sql-server-sort#comment22369601_15741345) from your previous question. — Martin Smith, Apr 01 '13 at 12:40
If you are not using Express edition, please try Database Tuning Advisor - Icon for it should be in your toolbar. IF it doesn't help the first time - try dropping all indexes and running it again — Nenad Zivkovic, Apr 01 '13 at 12:41
Ok thanks, and @MartinSmith sorry here is the link to the actual execution plan. appreciate your help. http://pastebin.com/Tuy69Nw7 And regards to your question earlier its 114094 — Richard C, Apr 01 '13 at 12:43
@RichardC urgh... grab a screen shot also of the execution plan, please? — Pure.Krome, Apr 01 '13 at 12:46
Estimated vs actual number of rows are way off. You are getting hash spills and sort spills and generally the plan is much more expensive than estimated. Probably the 114,094 rows going into the final join is the killer though. The seek on `Time_Start <= StartTimeDate` may well mean that half the 70K row table is being scanned each time assuming a row is equally likely to be `>` or `<`. So that would mean 3,993,290,000 rows are beingt processed in that join. — Martin Smith, Apr 01 '13 at 12:47
Do you have any idea how i can speed it up, sorry its my first time working with SQL server and this sort of project @MartinSmith thanks for the detailed help on this, do you have any suggestions on ways to get around the issue? — Richard C, Apr 01 '13 at 12:48
You have to apply indexes on the table columns on which your are joining. @Marlin Pierce haws already mentioned this — Rohit Vyas, Apr 01 '13 at 13:35
Hi yes, all the tables already have indexes on but its still running as slow — Richard C, Apr 01 '13 at 13:37
What's the granularity of your time dimension `Time_Dim`? An equals join to it based on a derived datetime value may be faster than the current `between` condition. — , Apr 01 '13 at 13:38

score 2 · Accepted Answer · answered Apr 01 '13 at 17:45

2

You may find your query performs better if it is rewritten like this:

SELECT
    Time_ID,
    Site_Type_ID,
    Abandoned_ID,
    WorkType_ID,
    SUM (im.ring_time) AS Ring_Time,
    SUM (im.hold_time) As Hold_Time,
    SUM (im.talk_time) AS Talk_Time,
    SUM (im.acw_time) AS ACW_Time,
    COUNT(*) CallCount
FROM incoming_measure im
INNER JOIN DataMartEnd.dbo.Time_Dim td
        ON dateadd(mi,
                   15*floor(datediff(mi,
                                     dateadd(dd, datediff(dd,0,im.StartTimeDate), 0),
                                     im.StartTimeDate ) / 15),
                   dateadd(dd, datediff(dd,0,im.StartTimeDate), 0) 
                  ) = td.Time_Start
INNER JOIN datamartend.dbo.Site_Type_Dim std
        ON im.DBID = std.Site_Type_Code
INNER JOIN datamartend.dbo.Abandoned_Call_Dim acd
        ON im.Abandoned = acd.abandoned_value
INNER JOIN DataMartEnd.dbo.Work_Type_Dim wtd
        ON im.DBID = wtd.MIG_Site_ID AND
           im.Queue = wtd.Work_Type_Code AND
           im.StartTimeDate BETWEEN wtd.DimEffectiveStartDtm AND wtd.DimEffectiveEndDtm
group by time_id, Site_Type_ID, Abandoned_ID, WorkType_ID

- so that the time dimension is joined on an equals value, rather than on a value between a range of values.

If this does not significantly improve performance, then I suggest creating an indexed view on your existing query, and selecting from the indexed view as your new query - you can find more about creating indexed views here, while there is some information on their limitations here.

answered Apr 01 '13 at 17:45

Hi that increased the performance greatly. thank you so much, is there any way you can explain in detail your inner join on time dim as i'm not sure what you've done? It took the query down from 6minutes to 6 seconds :) – Richard C Apr 01 '13 at 18:28
1

The expression `dateadd(...` rounds the datetime in `StartTimeDate` down to the nearest quarter of an hour (by truncating `StartTimeDate` to the start of the day, then adding the difference in quarters of an hour between the start of the day and `StartTimeDate` - see http://stackoverflow.com/questions/923295 ). An equijoin will normally be much faster than a join on a range of values. – Apr 01 '13 at 18:37
Hi @Mark Bannister, sorry i cant get my head around the code fully :/ could you please go through this section line by line if possible? INNER JOIN DataMartEnd.dbo.Time_Dim td ON dateadd(mi, 15*floor(datediff(mi, dateadd(dd, datediff(dd,0,im.StartTimeDate), 0), im.StartTimeDate ) / 15), dateadd(dd, datediff(dd,0,im.StartTimeDate), 0) ) = td.Time_Start – Richard C Apr 01 '13 at 20:30
2

@RichardC - You may also be interested in [this Connect item](https://connect.microsoft.com/SQLServer/feedback/details/780746/add-sql-server-engine-support-for-interval-queries-intersection-overlap-and-other-allen-s-interval-algebra-relations) on interval queries. – Martin Smith Apr 02 '13 at 07:34
@RichardC: I'm not sure which bit you want clarified - I assume you understand the equijoin part (linking the derived value to `DataMartEnd.dbo.Time_Dim.Time_Start`), but aren't clear on how the nearest previous quarter of an hour expression is derived? If so, hopefully this SQLFiddle will make the derivation a bit clearer: http://sqlfiddle.com/#!3/27998/3 - note that the two `[Previous Quarter Hour datetime]` values are essentially identical in terms of their derivation. If this doesn't answer your question, please can you clarify which bit you don't understand? – Apr 02 '13 at 08:34
@MarkBannister Hi, thanks for that, I'm fairly certain i understand the process and can see the results it achieves but I'm just trying to comment the code line by line and i think thats whats confusing me. I don't want to use to code without understanding it fully you see. – Richard C Apr 02 '13 at 10:35

score 1 · Answer 2 · answered Apr 01 '13 at 13:10

I think the problem with performance is due to joins like the:

FROM incoming_measure
INNER JOIN DataMartEnd.dbo.Time_Dim 
        ON incoming_measure.StartTimeDate BETWEEN Time_Dim.Time_Start and 
                                                  Time_Dim.Time_End

What is the granulatiry of Time_Dim? What is the granularity of StartTimeDate? The names suggest that one is measured in days and the other in hours, minutes, or seconds. This could result in lots of additional records being matched.

If you have a time dimension, why are you storing a regular date? If you have database date times, why are you using a time dimension table?

Also, you should give every table a readable alias. Trying to figure out someting like:

SUM (staging.dbo.incoming_measure.ring_time) AS Ring_Time,

is much harder than:

SUM (im.ring_time) AS Ring_Time,

Where im is a nice short alias for incoming_message.

The time_dim is in 15minutes periods and the starttimedate is the start time of an actual call. Both are stored as DATETIMEOFFSET I'm joining where the startimedate of a call is between the start and end time of the time_dim table — Richard C, Apr 01 '13 at 13:16

score 0 · Answer 3 · answered Apr 01 '13 at 13:01

Try this may be help you.

SELECT * FROM 
(SELECT  
ROW_NUMBER() OVER( PARTITION BY time_id, Site_Type_ID, Abandoned_ID, WorkType_ID ORDER BY time_id) No,
Time_ID,
    Site_Type_ID,
    Abandoned_ID,
    WorkType_ID,
    SUM (staging.dbo.incoming_measure.ring_time) OVER( PARTITION BY time_id, Site_Type_ID, Abandoned_ID, WorkType_ID) AS Ring_Time,
    SUM (staging.dbo.incoming_measure.hold_time) OVER( PARTITION BY time_id, Site_Type_ID, Abandoned_ID, WorkType_ID) As Hold_Time,
    SUM (staging.dbo.incoming_measure.talk_time) OVER( PARTITION BY time_id, Site_Type_ID, Abandoned_ID, WorkType_ID) AS Talk_Time,
    SUM (staging.dbo.incoming_measure.acw_time) OVER( PARTITION BY time_id, Site_Type_ID, Abandoned_ID, WorkType_ID) AS ACW_Time,
    COUNT(1)  OVER( PARTITION BY time_id, Site_Type_ID, Abandoned_ID, WorkType_ID) CallCount
FROM incoming_measure
INNER JOIN DataMartEnd.dbo.Time_Dim 
        ON incoming_measure.StartTimeDate BETWEEN Time_Dim.Time_Start and 
                                                  Time_Dim.Time_End
INNER JOIN datamartend.dbo.Site_Type_Dim 
        ON incoming_measure.DBID = Site_Type_Dim.Site_Type_Code
INNER JOIN datamartend.dbo.Abandoned_Call_Dim 
        ON incoming_measure.Abandoned = Abandoned_Call_Dim.abandoned_value
INNER JOIN DataMartEnd.dbo.Work_Type_Dim 
        ON incoming_measure.DBID = work_type_dim.MIG_Site_ID AND
           Work_Type_Dim.Work_Type_Code = incoming_measure.Queue AND
           incoming_measure.StartTimeDate BETWEEN Work_Type_Dim.DimEffectiveStartDtm AND
                                                  Work_Type_Dim.DimEffectiveEndDtm
                                                 ) AS T1 WHERE No = 1

or

SELECT  Time_ID,
    Site_Type_ID,
    Abandoned_ID,
    WorkType_ID,
    SUM (staging.dbo.incoming_measure.ring_time) AS Ring_Time,
    SUM (staging.dbo.incoming_measure.hold_time) As Hold_Time,
    SUM (staging.dbo.incoming_measure.talk_time) AS Talk_Time,
    SUM (staging.dbo.incoming_measure.acw_time) AS ACW_Time,
    COUNT(1) CallCount
FROM incoming_measure
INNER JOIN DataMartEnd.dbo.Time_Dim 
        ON incoming_measure.StartTimeDate BETWEEN Time_Dim.Time_Start and 
                                                  Time_Dim.Time_End
INNER JOIN datamartend.dbo.Site_Type_Dim 
        ON incoming_measure.DBID = Site_Type_Dim.Site_Type_Code
INNER JOIN datamartend.dbo.Abandoned_Call_Dim 
        ON incoming_measure.Abandoned = Abandoned_Call_Dim.abandoned_value
INNER JOIN DataMartEnd.dbo.Work_Type_Dim 
        ON incoming_measure.DBID = work_type_dim.MIG_Site_ID AND
           Work_Type_Dim.Work_Type_Code = incoming_measure.Queue AND
           incoming_measure.StartTimeDate BETWEEN Work_Type_Dim.DimEffectiveStartDtm AND
                                                  Work_Type_Dim.DimEffectiveEndDtm
group by time_id, Site_Type_ID, Abandoned_ID, WorkType_ID

Hi thanks for the suggestion but both these queries are taking just as long if not longer than the origional — Richard C, Apr 01 '13 at 13:17

score 0 · Answer 4 · answered Apr 01 '13 at 13:14

0

You need indexes on the following fields:

Time_Dim.Time_ID

incoming_measure.DBID
incoming_measure.Queue
incoming_measure.Abandoned
incoming_measure.StartTimeDate

Site_Type_Dim.Site_Type_ID
Site_Type_Dim.Site_Type_Code

Abandoned_Call_Dim.Abandoned_ID
Abandoned_Call_Dim.abandoned_value

Work_Type_Dim.WorkType_ID
Work_Type_Dim.Work_Type_Code
work_type_dim.MIG_Site_ID

I'm not completely certain about the order of the fields in the indexes, so you may have to experiment, but I suggest:

create index Time_Dim_Time_ID on Time_Dim (Time_ID)
create index incoming_measure_index on (DBID, Queue, Abandoned, StartTimeDate)
create index Site_Type_index on Site_Type_Dim (Site_Type_ID, Site_Type_Code)
create index Abandoned_Call_index on Abandoned_Call_Dim (Abandoned_ID, abandoned_value)
create index Work_Type_index on Work_Type_Dim (WorkType_ID, Work_Type_Code, MIG_Site_ID)

answered Apr 01 '13 at 13:14

Marlin Pierce

9,931
4
30
52

Hi, Yeah I've added those indexes but its slowed the query down for some strange reason! ha – Richard C Apr 01 '13 at 13:33
Try adding indexes where Site_Type_ID, Abandoned_ID, WorkType_ID are moved to the last column in the list. (You do not need to drop indexes as you are trying to get the query faster. Drop them later, when you solve the problem.) – Marlin Pierce Apr 01 '13 at 14:00
Also, how much slower? Just a little bit, or much slower? – Marlin Pierce Apr 01 '13 at 14:02
before it was around 6minutes and then up to 12, I've added more index to time dim start and end time and its back to the origional 6 minutes. I think im just going to leave it at the 6 minutes it could be due to my laptop – Richard C Apr 01 '13 at 14:24
If the time is because of the database processing, indexing is most of what you can do. If the time is because of data transfer across the network, then there is nothing you can do to reduce that except reduce the size of the result set (both reduce the number of records and the record size). I have used network sniffing to determine how much time is actually spent in the database, how much time is spent transferring the data, and how much is the remaining time, for the application to process the incoming data. – Marlin Pierce Apr 01 '13 at 15:17

How Can I speed up my SQL Query

4 Answers4