0

This is my SQL query what I tried

SELECT 
    TransactionNumber,Time
FROM 
    [HQMatajer].[dbo].[Transaction]
WHERE 
    Time>='2015-01-01' and Time<='2015-12-31'

Total records returned 2.8 million. It's taking 10 seconds.

I created an index for Time and TransactionNumber as well.

Still what else I have to do get the rows faster than this?

Thanks

Gurwinder Singh
  • 38,557
  • 6
  • 51
  • 76
mohamed faisal
  • 406
  • 1
  • 6
  • 21
  • 2
    Please use internationally recognized measures like thousands, millions, billions - not regional ones ("lakhs") other people won't know what they mean ... – marc_s Dec 31 '16 at 10:02
  • @CraigYoung both are in ASC order – mohamed faisal Dec 31 '16 at 10:07
  • 1
    @Craig I meant column order – Gurwinder Singh Dec 31 '16 at 10:07
  • 1
    @GurwinderSingh As i thoght it were two indexes, i posted my answer – Turo Dec 31 '16 at 10:10
  • @GurwinderSingh it's a single index with two column names and both are sorting by ASC – mohamed faisal Dec 31 '16 at 10:11
  • 1
    2.8 million rows is ***a lot of data***! It's going to take a "long" time no matter what. If you want to improve it, return less data. 2 options might yield marginal improvement (1) `CLUSTERED` index on `Time` (2) Index on `Time INCLUDE (TransactionNumer)` ... but either way your problem is not the query/index, it's the amount of data. – Disillusioned Dec 31 '16 at 10:13
  • 1
    @GurwinderSingh Yes, column order on indexes is very important. – Disillusioned Dec 31 '16 at 10:15
  • 2.8 million rows is not really a LOT of data, it's just not a small amount. If your data is going to continue to grow, consider partitioning it (e.g. per year), which would speed up access. – Jeff Watkins Dec 31 '16 at 10:24
  • @JeffWatkins 2.8 million is not a lot for data stored in a table. But it is a lot to return in a single `SELECT` statement. What exactly would someone be doing with 2.8 million rows that 10 seconds is considered slow? – Disillusioned Dec 31 '16 at 10:30
  • @CraigYoung This is not the result. Using whole year transaction number I have to calculate the total sales further. – mohamed faisal Dec 31 '16 at 10:33
  • @JeffWatkins I am just fetching only one year records here. I don't understand what are you really meant to say. Thanks – mohamed faisal Dec 31 '16 at 10:34
  • If the table continues to grow, it'll gradually take longer to retrieve records unless the index is clustered. In actuality, if you're performing this query a lot, you may want to consider pre-extracting the data into a data mart so subsequent access is much quicker. As Craig mentioned though, 10s for 2.8m rows isn't exactly slow. – Jeff Watkins Dec 31 '16 at 10:55
  • 1
    @mohamed btw, I did a little benchmark test `SELECT TOP 2800000 IntCol, DateCol` from a table with only those 2 columns and 10 million rows. This is the fastest you could possibly achieve because there is no filtering, no extra processing - the query just returns the first 2.8 million rows it can find. Result: Server time to get the data `1 second`. Total elapsed time which includes sending the data to the SSMS client (on the same machine, a low-end laptop) and loading it into the grid: `10 seconds`. Conclusion, your query is probably fast enough on your hardware. Optimise elsewhere. – Disillusioned Jan 01 '17 at 07:15

2 Answers2

5

2.8 million rows is a lot of data! It's going to take a "long" (long being relative - 10 seconds is reasonable for 2.8 million rows) time no matter what. If you want to improve it, return less data.

There are 2 options that might yield marginal improvement:

  • Create a CLUSTERED on the Time column. Note a table is only allowed 1 clustered index. So this might not be a good when other factors are considered.
  • Create an index on Time that INCLUDEs TransactionNumer

But either way your problem is not the query/index, it's the amount of data.


Additional Info

  • Using 2 separate indexes as you have is not going to be of any help. Your query is only able to use 1 index (whichever it considers is best).
  • I have assumed that your TransactionNumber really is a numeric data type and not one of the horrific "string numbers" I see far too often. If you are using some form of varchar to store your TransactionNumber you would do well to change that to int or bigint. These types require less bytes to store ==> meaning less data ==> meaning better performance.
  • Your query currently has a bug unless you truncate the time portion of your Time column. Clearly you intend returning all transactions for the year. But currently you'll miss out on all transactions for the 31 December that did not happen at 00:00. To fix this Time range queries should be filtered as follows:
    • Time >= StartOfRange and Time < StartOfNextRange
    • E.g. Time >= '2015-01-01' and Time < '2016-01-01'
    • NB: Note Strictly < in the second condition. This includes all transactions before the very beginning of the next year.

Math

Some calculations on why I say 2.8 million rows is "a lot of data to return".

  • You return TransactionNumber and Time. I'll assume 4 and 8 bytes per row respectively. (It could be more but this is a very conservative estimate.)
  • That's 12 bytes * 2,800,000 = 33,600,000 bytes (approximately 32 MB).
  • This is just the minimum data returned. Bear in mind that the data processed is much more. Including other columns of your table and the index trees.
  • To transfer 33.6 MB of data in 10 seconds between server and client would require: 32 MB * 8 / 10 = 25.6 I.e. an average transfer rate of well over 25 megabits per second.

Conclusion

Whatever it is you're trying to do that requires 2.8 million rows possibly needs a rethink on your approach. However, if you're talking about a once/year job - then is 10 seconds really worth worrying about?

Community
  • 1
  • 1
Disillusioned
  • 14,635
  • 3
  • 43
  • 77
0

Maybe if you put more of the calculations upfront in variables where they will only be executed once you'll get better performance. Probably won't make much of a difference, and may even be detrimental to performance, but interesting to try.

DECLARE @midwaySeconds int = DATEDIFF(s, '2015-1-1 00:00:00', '2015-12-31 23:59:59') / 2
DECLARE @midwayDate datetime2 = DATEADD(s, @midwaySeconds, '2015-1-1 00:00:00')
SELECT 
    TransactionNumber,Time
FROM 
    [HQMatajer].[dbo].[Transaction]
WHERE 
    ABS(DATEDIFF(s, Time, @midwayDate)) < @midwaySeconds
  • OP doesn't have any "calculations". `Time` is evaluated against 2 constant date-literals. Your suggested change actually ***adds*** a calculation on the `Time` column; which also makes the query [non-Sargable](http://stackoverflow.com/q/799584/224704). As a result, it becomes impossible for the query to benefit from the correct index; It has to scan ***all rows***. (This is about the worst advice possible.) – Disillusioned Jan 01 '17 at 23:25
  • Thanks for setting me straight on the performance hit taken when adding functions to where clauses. The non-Sargable link you included was very helpful. – Chad Stellrecht Jan 02 '17 at 13:05