Microsoft SQL query optimization

Question

I created a piece of SQL using a cursor to get some values from a table, however it's incredibly inefficient and takes quite a while to execute. I need to use the results for an SSRS candlestick graph, so I need to collect the min value, max value, and open and close values, however I'm not very familiar with SQL and need to optimize. The data in the table gets a new timestamp and pipelinecount every 5 minutes. Here's the code I have so far:

declare
@min int,
@max int,
@open int,
@close int,
@date date

create table ##Candle
(
    MinVal int,
    MaxVal int,
    OpenVal int,
    CloseVal int,
    CalDate date
)

DECLARE C1 CURSOR LOCAL FOR SELECT CONVERT(date,TimeCollected) as CalendarDate from data.PipelineCount where TimeCollected > dateadd(mm,-1,CONVERT(date,GETDATE())) order by CalendarDate;
OPEN C1;
FETCH NEXT FROM C1 INTO @date;

WHILE @@FETCH_STATUS = 0
BEGIN
    select
    @min = min(PipelineCount),
    @max = max(PipelineCount)
    from data.PipelineCount
    where convert(date,TimeCollected) = @date;

    select top 1
    @open = PipelineCount
    from data.PipelineCount
    where convert(date,TimeCollected) = @date and datepart(hour,TimeCollected) = 8;

    select top 1
    @close = PipelineCount
    from data.PipelineCount
    where convert(date,TimeCollected) = @date and datepart(hour,TimeCollected) = 17;

    insert into ##Candle values(@min,@max,@open,@close,@date);
FETCH NEXT FROM C1 INTO @date;
END
CLOSE C1;
DEALLOCATE C1;

Anyone have any ideas to help?

Yes. Stop using a cursor to insert like this. Just create a select statement to get the rows you need. And also don't use global temp tables, they are fraught with issues of concurrency. — Sean Lange, Feb 03 '20 at 21:57
um - you have but a single price during the time from 8:00 AM to 8:59:59 AM? That seems incorrect. Is not the open price the very **first** (by time) for a given date? — SMor, Feb 03 '20 at 22:00
I've tried a few different things to get the data, I can get the min and max with either of the other selects but I haven't figured out a way to only return one count for each the time requirements. — Mark Torlone, Feb 03 '20 at 22:00
@SMor there is a new entry every 5 minutes, the open time is the 8th hour and close time is the 17th hour, selecting the top 1 from the hour would always return the first entry of that hour yes? — Mark Torlone, Feb 03 '20 at 22:02
Nope - you use TOP without an order by clause. Under lightly loaded conditions this might appear to give you what you desire, but there is not guarantee You also assume that at least one price exists for that time period. Perhaps that is typical - but beware your assumptions. If no such price exists, guess what happens? The value from the previous loop iteration remains in the variable. That is the danger of using select to assign a scalar variable. — SMor, Feb 03 '20 at 22:06
If you are using a current version, you can use first_value (and last_value) in a manner similar to [this post](https://stackoverflow.com/questions/41840829/using-group-by-with-first-value-and-last-value). — SMor, Feb 03 '20 at 22:11
Does PipelineCount have a unique ID such is "id" so you can inner join the table to itself, like this: select min(p1.PipelineCount),max(p2.PipelineCount) from PipelineCount p1 inner join PipelineCount p2 on p1.id = p2.id — smoore4, Feb 03 '20 at 22:47
Thank you @Smor, that makes perfect sense. I'll try using those. — Mark Torlone, Feb 04 '20 at 01:00
I do have a RecordId column that's an int @smoore4, so I can do an inner join on that. — Mark Torlone, Feb 04 '20 at 01:01

score 0 · Answer 1 · answered Feb 04 '20 at 00:28

Your basic query performance problem lies in this WHERE clause

where convert(date,TimeCollected) = @date

You want that clause to be able to use an index on your TimeCollected column in your table. Try something like this:

where TimeCollected >= @date
  and TimeCollected < DATEADD(DAY, 1, @date)

Why is this better? When you apply a function to a column in a where clause, the server must look at every row to see if it matches, in a so-called full table scan. My suggested refactoring of your where clause lets the server do a range scan on an index, so it looks at far fewer rows. In this case it only looks at rows between @date and one day later.

With respect, I don't understand the logic in your example query so I can't refactor the whole thing for you. But making your where clauses "sargable", as I suggest, will help a lot. Read this. What makes a SQL statement sargable?

By the way, if you right-click in SSMS's query panel, you can check an option asking for the "actual query plan". Looking at that can teach you a lot about why your query is slow. It even recommends indexes if need be.

What I'm trying to do is retrieve the minimum value, the maximum value, the opening value (the value at hour 8), and the closing value (the value at hour 17) of PipelineCount. What gets stored in this database is an int Record number(primary key), TimeCollected as datetime, and PipelineCount as int. I couldn't figure out a way to do it in a select statement so I created a cursor to iterate each day and get those values. Thank you very much for your input, I'll make this change and see how it affects my query. — Mark Torlone, Feb 04 '20 at 00:59

Microsoft SQL query optimization

1 Answers1