0

I have a set of tables

dbo.Store_000
dbo.Store_001
....
dbo.Store_216 

containing the sales of different stores, where relevant fields are

Username, ItemID, Description, CreatedDate, CountryID

I need to extract the latest 20 sales from all Stores and I wrote the following query:

select top 20 UserName, ItemID, Description, CreatedDate, CountryID
FROM (  
    SELECT UserName, ItemID, Description, CreatedDate, CountryID FROM dbo.Store_000 
    UNION ALL
    SELECT UserName, ItemID, Description, CreatedDate, CountryID FROM dbo.Store_001 
    UNION ALL
    SELECT UserName, ItemID, Description, CreatedDate, CountryID FROM dbo.Store_002
    .....
    ... 
    UNION ALL
    SELECT UserName, ItemID, Description, CreatedDate, CountryID FROM dbo.Store_216                   
) ii
order by ii.createdDate desc

currently tables contains some 200M records (all together)

therefore to reduce processing time I created the following index for each table:

CREATE NONCLUSTERED INDEX I2_Store000 ON dbo.Store_000
  (UserName)
INCLUDE (itemId, Description, CreatedDate, CountryID)
WITH (
  PAD_INDEX = OFF,
  DROP_EXISTING = OFF,
  STATISTICS_NORECOMPUTE = OFF,
  SORT_IN_TEMPDB = OFF,
  ONLINE = OFF,
  ALLOW_ROW_LOCKS = ON,
  ALLOW_PAGE_LOCKS = ON)
ON [PRIMARY]

but it still takes too much (some minutes on our machine)

can you suggest how to improve the query and or the index to reduce processing time?

should I create the index based on CreatedDate since that is the sorting field?

if it can help, HERE can find the MSSMS execution plan.

thanks

Joe
  • 1,033
  • 1
  • 16
  • 39
  • Eralper is correct. Also, you are ignoring the Data Set Theory by treating records as cursors and essentially running 21 different queries. The internet is full of productive tutorials on creating queries, such as thoughtbot.com, blog.SQLAuthority, and TECHNET. For a quick refresher on querying, check out [Thoughtbot's page](https://robots.thoughtbot.com/back-to-basics-sql) – clifton_h Jun 30 '16 at 06:56
  • Hi, what do you mean for `Data Set Theory by treating records as cursors and essentially running 21 different queries`? where can I find something about that. – Joe Jun 30 '16 at 07:58
  • One way of saying database languages are not line by line languages like C++ and treat tables (data sets) as a whole. Check out SQLMag's [T-SQL Foundations: Thinking in Sets](http://sqlmag.com/t-sql/t-sql-foundations-thinking-sets) – clifton_h Jun 30 '16 at 08:10
  • Another way of describing the concept is when you look at a book, you see each page as part of the work, a relation, that has pages and chapters to identify sections of the data. This makes comparing and searching the information MUCH faster than having to go page by page to find a certain chapter or section. – clifton_h Jun 30 '16 at 08:19
  • Yes, I understood the concept but need to find some example how to apply the concept into real world.. will check the link you passed me. – Joe Jun 30 '16 at 08:38

3 Answers3

2

Just for performance, I should only select top 20 rows from each table ordered by createdDate, then select a new top 20 from the new derived table which is about 17 table * 20 = 340 rows

I hope it will be easier to deal with smaller table, I really wonder the result.

Eralper
  • 6,461
  • 2
  • 21
  • 27
  • Hi, thanks for prompt answer.. actually situation has improved, but not enough: now takes some 4 minutes, but removing inner sort it takes around 1 second, therefore I think I have to create an index based on createdDate to avoid sort, but since I have another index, I have to understand if it is better (and possible) to expand this index or create a second index – Joe Jun 30 '16 at 07:08
  • @Joe What is that index for? The only "predicate" you have is sorting by `createdDate` and `TOP`. How could be index on `UserName` helpful here? – Ivan Starostin Jun 30 '16 at 08:08
  • well, that actually is an index I already have on that table since I need to join this table with other on Username, and I add `CreatedDate` in the include but does not help so much, on the other side I would like to create a further index on some 200M records – Joe Jun 30 '16 at 08:54
  • @IvanStarostin indexes allow for logical/physical ordering of the data. So instead of comparing all or many of the rows on a column that has no index, SQL Server can choose the most optimal query plan available. But yeah, comparisons are needed. – clifton_h Jun 30 '16 at 09:33
  • @Joe _"I add CreatedDate in the include but does not help"_ - could it? Are columns in `INCLUDE` part sorted in index? How does sql optimizer choos an index to use? Create new index **starting** with _CreatedDate_. – Ivan Starostin Jun 30 '16 at 09:37
  • Hi @Joe, you can create an index on each sub-table on field createdDate . This will definitely minimize query time. So keep on selecting 20 rows on indexed fields of each table, then get the latest 20 from over all. The outer sorting will not take too much time since the row count will be only around 350.. – Eralper Jun 30 '16 at 11:19
0

The first (and quicker) option would be do carry out a TOP 20 on your subqueries before you then sort in the outer select, something like this;

select top 20 UserName, ItemID, Description, CreatedDate, CountryID
FROM (  
    SELECT  TOP 20 UserName, ItemID, Description, CreatedDate, CountryID FROM dbo.Store_000 ORDER BY CreatedDate DESC
    UNION ALL
    SELECT TOP 20 UserName, ItemID, Description, CreatedDate, CountryID FROM dbo.Store_001 ORDER BY CreatedDate DESC
    UNION ALL
    SELECT TOP 20 UserName, ItemID, Description, CreatedDate, CountryID FROM dbo.Store_002ORDER BY CreatedDate DESC
    .....
    ... 
    UNION ALL
    SELECT TOP 20 UserName, ItemID, Description, CreatedDate, CountryID FROM dbo.Store_216   ORDER BY CreatedDate DESC                
) ii
order by ii.createdDate desc

The index you're going to want on each table will be like this (index name just an example);

CREATE NONCLUSTERED INDEX [IX_Store_000_CreatedDate_Desc_Incl] ON [dbo].[Store_000] ([CreatedDate] DESC)
INCLUDE ([UserName],[ItemID],[Description],[CreatedDate],[CountryID]) 

Your other option is to create an indexed view if you're going to be calling this regularly. There are upsides and downsides to an indexed view so you will have to make that call yourself, further reading below;

https://www.simple-talk.com/sql/learn-sql-server/sql-server-indexed-views-the-basics/

https://www.brentozar.com/archive/2013/11/what-you-can-and-cant-do-with-indexed-views/

SQL Server - Creating an Indexed View

Community
  • 1
  • 1
Rich Benner
  • 7,873
  • 9
  • 33
  • 39
0

There are other techniques you could use that could optimize the query. Now, UNION ALL is appropriate since the values from each table should be unique.

Alex Martelli from What is the difference between JOIN and UNION?

UNION puts lines from queries after each other, while JOIN makes a cartesian product and subsets it -- completely different operations. Trivial example of UNION:

mysql> SELECT 23 AS bah
    -> UNION
    -> SELECT 45 AS bah;
+-----+  
| bah |
+-----+   
|  23 |
|  45 | 
+-----+ 2 rows in set (0.00 sec) similary trivial example of JOIN:

mysql> SELECT * FROM 
    -> (SELECT 23 AS bah) AS foo 
    -> JOIN 
    -> (SELECT 45 AS bah) AS bar
    -> ON (33=33);
+-----+-----+ 
| bah | bah |
+-----+-----+ 
|  23 |  45 | 
+-----+-----+ 1 row in set (0.01 sec)
  • USE A PREDICATE IN ALL 20+ tables.

UNION ALL may be effective, but it still has to sort the ENTIRE TABLE to identify the TOP 20. Instead, assuming the stores always have more than 20 sales in a given 30-day period, use DATEADD or DATEDIFF to enable boolean comparisons that limit the number of rows SQL Server has to return with before comparing.

  • USE AN INDEX...ON THE RIGHT COLUMN

Obviously, your report should use INDEXES and logical ones at that. CLUSTERED INDEXES are less expensive to build and maintain.

You know names might be unique, but SQL Server is not comparing names...it is comparing DATES. Therefore, create an INDEX on createdDate.

  • USE Search ARGuments (SARGs)

Even the best laid plans can be ruined if SQL Server thinks it must go through every record every time. When a FUNCTION or variable are used on both sides of the predicate like below:

createdDATE BETWEEN DATEADD(DD, createdDate, 30) AND CAST(GETDATE() AS DATETIME2)

SQL Server has no choice but to go through the ENTIRE table or INDEX list (TABLE/INDEX SCAN) before comparing values. Instead, reorder it like this:

createdDATE >= DATEADD(DD, -30, GETDATE() )

Compare the results with this query and how an INDEX on createdDate improves the time:

SELECT UserName, ItemID, Description, CreatedDate, CountryID
FROM (  
    SELECT TOP 20 UserName, ItemID, [Description], CreatedDate, CountryID FROM dbo.Store_000 
    WHERE CreatedDate >= DATEADD(DD, -30, GETDATE())
    UNION ALL
    SELECT TOP 20 UserName, ItemID, [Description], CreatedDate, CountryID FROM dbo.Store_001 
    WHERE CreatedDate >= DATEADD(DD, -30, GETDATE())
    UNION ALL
    SELECT TOP 20 UserName, ItemID, [Description], CreatedDate, CountryID FROM dbo.Store_002 
    WHERE CreatedDate >= DATEADD(DD, -30, GETDATE())
    .....
    ... 
    UNION ALL
    SELECT TOP 20 UserName, ItemID, [Description], CreatedDate, CountryID FROM dbo.Store_216 
    WHERE CreatedDate >= DATEADD(DD, -30, GETDATE())
) ii
-- ORDER BY CreatedDate DESC    /*Unless you must, drop the ORDER BY. */
Community
  • 1
  • 1
clifton_h
  • 1,298
  • 8
  • 10