0

I have a database with millions of rows and I want to create an index which fixes performance issues as the query is over-using the processors. The database is hosted on MS Azure.

if I have a query similar to the following, for example to get employees who are not manager:

SELECT name, position, job title WHERE name LIKE '{0}%' AND manager = 0 AND employee = 1  ORDER BY senior DESC  OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY

manager, employee and senior are all booleans (senior employees should show first in results)

We have tried something similar to the following but the server is still unable to handle the traffic:

CREATE NONCLUSTERED INDEX [IX_Index] ON [dbo].[Employees]
(
    [Manager] ASC,
    [Employee] ASC,
    [Name] ASC
)
INCLUDE (   [Position],
    [JobTitle],
    [Senior]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = ON, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

The execution plan shows:

47% on index seek and 53% on top N sort if I search from SSMS

17% index seek, 26% parallelism, 56% sort on tracked queries from webapp

(both webapp and database are hosted on Azure)

What would be a proper index?

Edit:

I have received a suggestion to use a filtered index (like where manager = 0 and employee = 1)

but the sql server won't select it and I'm not able to force it. Any suggestions?

Techy
  • 2,026
  • 4
  • 20
  • 41
  • Show the execution plan please. I'd suggest to place most selective column first, `[Name] ASC, [Manager] ASC, [Employee] ASC` . – Serg Jul 29 '17 at 14:18
  • @Serg please see edit above. Thanks a lot. – Techy Jul 29 '17 at 14:31
  • @Serg Also, would it be best to have "Senior" in the beginning or inside include? since we have it in orderby – Techy Jul 29 '17 at 15:22
  • @Techy `SELECT name, position, job title WHERE name LIKE '{0}%'` Do you use String.Format to set value for like? If yes then you are more than likely vunerable to SQL Injection. – Lukasz Szozda Jul 29 '17 at 15:34
  • @Serg Placing `name` first in the index means that the seek will need to remove those rows not matching the manager and employee criteria as a residual predicate. Best to have these columns first and name third. So it can do an equality seek on those two columns then a range seek on the third column to get the names starting with the desired string. – Martin Smith Jul 29 '17 at 15:52
  • I disagree. The first column in the index should be the one that returns the fewest rows. I wouldn't create an index to tune a single query. The index needs to support all likely queries. – LAS Jul 29 '17 at 16:19
  • @LAS The question is how to tune this specific query. We have no idea what other queries the OP's workload may contain or their relative importance. Your suggestion will make performance of this specific query **worse** than the index they already have in place when asking the question. – Martin Smith Jul 29 '17 at 16:28
  • @Las thank you, just to confirm, we only use this query on this index, what do you think? – Techy Jul 29 '17 at 17:48
  • @Techy, I'm afraid this debate is making it harder for you. So I'd suggest trying our suggestions to see which works best for your situation. In my experience and through SQL tuning courses, I've tried to make sure indexes return < 2% of the data. When your query returns more rows than that, a table scan will often be faster. But the only way to know for sure is to try different combinations of the index. – LAS Jul 29 '17 at 20:04
  • @LAS I have also received a suggestion from MSDN to create and index with filters which I implemented but I'm having a table deadlocking issue and can't delete old indexes. Thanks a lot – Techy Jul 29 '17 at 20:13
  • @Techy If you can't drop the index, it's because it's being used. Deadlocking happens when two sessions are waiting for the same resource. Oracle will rollback one of the transactions but I'm not sure about SQL Server. If you truly have deadlock, you need to kill the sessions deadlocked. I'm assuming you're in a test environment. Once the resources release the index, you should be able to drop it. – LAS Jul 29 '17 at 20:49
  • @LAS please see edit above. Thanks :) – Techy Jul 29 '17 at 22:55
  • @Techy if SQL Server isn't using the index it's because it think's not using it is more efficient. What version of SQL Server are you running? I found a thread that says the offset fetch isn't supported in v2008. https://dba.stackexchange.com/questions/136259/offset-fetch-in-sql-server-2008. – LAS Jul 30 '17 at 00:32
  • sqlmag.com/t-sql/offsetfetch-part-1. This article is by Itzik Ben-Gan. I still think your issue is on of volume. The order by is done last so even though you're asking for 10 rows, you're processing 17 million first. – LAS Jul 30 '17 at 16:16
  • @LAS thank you so much. Any recommendation on this matter is appreciated. – Techy Jul 30 '17 at 21:36
  • instead of saying "47% on index seek and 53% on top N sort if I search from SSMS 17% index seek, 26% parallelism, 56% sort on tracked queries from webapp" you could always post actual plan as xml (not screenshot),it would be a lot helpfull – TheGameiswar Jul 31 '17 at 19:16

2 Answers2

2

You say senior is a boolean. This is the only thing you are sorting on. Your current order by is not specific enough to use paging reliably. You need to add secondary items such as name to make it deterministic

To optimise this query you can first add this index

CREATE NONCLUSTERED INDEX [IX_Index]
  ON [dbo].[Employees] ( [Senior] ASC, [Name] ASC )
  INCLUDE ( [Position], [JobTitle], [Manager], [Employee])
WHERE Manager = 0 AND Employee = 1;

Then rewrite it to break it into two queries. One retrieving the seniors and one retrieving the non seniors.

The following uses efficient seeks and no sort operators.

DECLARE @Offset INT = 0, 
        @Fetch INT = 10,
        @SeniorCount INT;

DECLARE @Result TABLE (
    ResultId INT IDENTITY PRIMARY KEY,
    Name varchar(50) NULL,
    Position varchar(50) NULL,
    JobTitle varchar(50) NULL
) 


INSERT INTO @Result
SELECT TOP (@Fetch + @Offset) name,
                position,
                JobTitle
FROM   dbo.Employees
WHERE  name LIKE '{0}%'
       AND manager = 0
       AND employee = 1
       AND Senior = 1
order by name;

SET @SeniorCount = @@ROWCOUNT;


INSERT INTO @Result
SELECT TOP (@Fetch + @Offset - @SeniorCount) name,
                position,
                JobTitle
FROM   dbo.Employees 
WHERE  name LIKE '{0}%'
       AND manager = 0
       AND employee = 1
       AND Senior = 0
order by name
OPTION (RECOMPILE);

SELECT *
FROM @Result
WHERE ResultId > @Offset AND ResultId<= @Offset + @Fetch
ORDER BY ResultId;

enter image description here

Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • Thanks a lot. We need paging though. – Techy Jul 30 '17 at 01:24
  • @Techy your current `order by` is not specific enough to use paging reliably. You need to add secondary items such as name to make it deterministic. The same basic approach can be used with paging though. You just need to materialise the first result into a table variable or temp table. Then you can select what is needed from that and calculate what is needed from the second query. – Martin Smith Jul 30 '17 at 11:50
1

Your where clause structure isn't well-suited for index use. This is going to get complicated so bear with me:

Manager and employee are low variability columns. A b-tree+ index won't help. B-tree+ indexes work well when a very small number of rows are returned. Every row in your table will either be an employee or manager so you're potentially searching every row. Your highest variability candidate is name. It should be first in the index. However, in this particular query, the like statement may cause the optimizer to skip the index. If the optimizer recognizes the prefix of the name is known, it may use the index for a partial search. Either way, I'd structure the index as follows:


    CREATE NONCLUSTERED INDEX [IX_Index] ON [dbo].[Employees]
    (
     [Name] ASC,
     [Manager] ASC,
     [Senior] DESC
    )
    INCLUDE ([Employee], [Position], [JobTitle])

The best index structure for low variability data like manager and employee is a bitmap which SQL Server doesn't provide at this time. I only include Manager and Senior in the index because you're using an offset-fetch. I'm assuming if the row is not a manager, it is an employee so I only need one of them in the index. I'm not sure how the results are being used. I'm going to guess that you're paging them back into a user's interface. In that case, consider the distribution of your data. I counted the number of names in my own names table and got the following distribution:

 First         Percent of
 Letter Count  Total
    A   22911   3.31
    B   58989   8.5
    BA  13442   1.94
    ...

My table is much smaller than yours but I believe what I'm saying will scale. You need to reduce the amount of rows "found". In your query, make sure the name string {0} has enough characters so the percent found is low, say 5% of your total number of rows. If {0} is null, change it to an 'A', if {0} is a 'B', change it to 'BA'. If you really want to return all employees when {0} is null then I don't think an index will help you. Indexes are used to reduce the result set.

LAS
  • 829
  • 5
  • 7
  • Thanks a lot, i'm wondering why you don't want me to include other columns. There are many other columns and the table has over 22 million rows so I'm afraid that retrieving these columns would add a lot of load and slow the query. What do you think? – Techy Jul 29 '17 at 16:13
  • Putting name first in the index is **not** a good optimisation for this query. With this arrangement it can do a range seek on name but still neeeds to read and discard rows not matching the `Manager` and `Employee` predicate. With these columns first it just needs a range seek for a section of index where names are guaranteed to match the `manager = 0 AND employee = 1` condition. – Martin Smith Jul 29 '17 at 16:15
  • This also does nothing to get rid of the sort operator. – Martin Smith Jul 29 '17 at 16:22
  • Still don't agree. Consider the cardinality. Say 80% of the rows are employees and 20% managers. Now say I'm searching for all employees with the name prefrex DAV. You're saying you'd first return all employees (22mill * .8 = 17,6mill) rows and then scan for DAV? I'd look for DAV (50 rows in my local phone book) and then scan for employee status. – LAS Jul 29 '17 at 17:00
  • @LAS so if I remove the include, is there a way to optimize the sorting by "senior" as it consumes a lot, should I add it to the composite key? – Techy Jul 29 '17 at 17:46
  • RE: "don't agree" This is not the correct mental model for how a composite index works. The rows in the index in the Q are logically stored in the same sort order as `ORDER BY Manager, Employee, Name`. The upper levels of the indexes contain details of the lowest possible *composite* key value on each page at the level underneath. This can be efficiently navigated to find the first key matching (0,1,Dav%). Then the keys can be read in order until the upper bound of the range is reached. See my answer here https://stackoverflow.com/a/39080819/73226 – Martin Smith Jul 29 '17 at 18:22
  • Or if you don't want to take my word for it see http://use-the-index-luke.com/sql/myth-directory/most-selective-first – Martin Smith Jul 29 '17 at 18:26
  • Quote from given thread: "It’s useless to have the most selective column of the index on the left if very few queries filter on it. Queries that don’t filter on it, but do filter on the other columns of the index will have to scan, and scans are expensive." Martin do you really think Manager will be queried more often than name?" – LAS Jul 29 '17 at 19:56
  • Quite possibly not but there can be other indexes to support queries on name. The question here is how to optimize the specific query they have identified as having performance problems. The answer to that question is not to make the existing index worse suited to the query. I also don't subscribe to your blanket view on included columns. In many systems data is read much more frequently than written so optimising for reads makes good sense. – Martin Smith Jul 29 '17 at 20:20
  • @MartinSmith I have received a suggestion in msdn to create a filtered index and it makes a lot of sense but I have tried it and the sql server is not choosing it nor was I able to force it. Do you think that there is a way around it? please see edit above – Techy Jul 29 '17 at 22:52
  • I've already given you my suggestion in my answer. That index should be able to be made filtered with no problem. – Martin Smith Jul 29 '17 at 23:05
  • Added covered columns back to index after reading white paper on offset-fetch – LAS Jul 30 '17 at 19:36