4

I am a software developer and I was recently approached by DBA to optimize the query that an app of mine is using. DBA reported that query takes about 50% of CPU and high I/O operations when it runs. The query is pretty straight forward and I am unsure how to optimize it.

Question 1: How can I optimize this query?

Question 2: is it even my job to do so, shouldn't DBA be more knowledgeable in this? Mind you we have no DB developers, just DBA and Software Developers.

DB has approximately 30-50 million of records, it is constantly maintained/monitored by DBA, but I am unsure how. Server is on a dedicated machine and is Microsoft SQL Server 2005 - 9.00.5057.00 (X64)

PS: Please do not provide ways to improve DB by structural changes, I know it's a bad design to have currency stored as varchar, but it is what it is, we can't change DB structure, only queries accessing it.

Thank you for any insight.

Query:

SELECT
    COALESCE(CAST([PH].[PAmount] AS decimal(15, 2)) + CAST([PH].[Fee] AS decimal(15, 2)), 0.0) AS [PayAmount],
    [PH].[PDate] AS [PayDate]
FROM [History] AS [PH] WITH (NOLOCK)
WHERE [PH].[PMode] IN ('C', 'P')
    AND [PH].[INNO] = 'XYZ'
    AND [PH].[PStatus] IN ('CONSERVED', 'EXPECTING', 'REFRIGERATED', 'POSTPONED', 'FILED')
    AND [PH].[Locked] = 1
    AND [PH].[PDate] >= 'Jan 1, 2015'
ORDER BY [PH].[PDate] ASC

Fields:

PAmount - non-clustered index, varchar(50)

Fee - not indexed, decimal(6,2)

PDate - clustered index, datetime

PMode - non-clustered index, varchar(5)

INNO - non-clustered index, varchar(50)

PStatus - non-clustered index, varchar(50)

Locked - not indexed, bit

Execution plan: SELECT---Compute Scalar---Filter---NestedLoops-|--Index Seek (Inner Join) | cost 0% Cost 0% Cost 0% Cost 0% | cost 4% |---Key Lookup Cost 96%

George
  • 2,165
  • 2
  • 22
  • 34
  • What indexes on the table? How many rows does the query return? What does the execution plan look like? – Martin Smith Aug 25 '15 at 20:45
  • 1
    Short of some indexing there isn't a lot you can do here. I would however highly recommend you stop using NOLOCK. This is dealing with money and that hint means your results are not accurate. http://blogs.sqlsentry.com/aaronbertrand/bad-habits-nolock-everywhere/ – Sean Lange Aug 25 '15 at 20:57
  • Just an fyi: Sql Server 2005 reaches end of life in less than a year. – Joel Coehoorn Aug 25 '15 at 21:35
  • @ Martin: Indexes were provided in the original post. I've edited with execution path. Please let me know what else you need. Thanks. – George Aug 26 '15 at 13:26
  • @ Sean: We've been strongly adviced to ALWAYS use NOLOCK. Maybe its a specifics of our DB - 99% of DB transactions is an INSERT. We almost never do an edit. NOLOCK prevents records from being locked if working in parallel on a heavy-load table. – George Aug 26 '15 at 13:27
  • @ Joel: thnx for the info. Its DBA's job to upgrade things, from my past experience any large corporation will be reluctant to upgrade to newest technology. – George Aug 26 '15 at 13:29
  • So looks like it is seeking in on the [INNO] = 'XYZ' predicate. That index has a hidden secondary key of PDate as the clustered index key is added in to the key of non unique NCIs so it might also be able to seek into the > PDate. But in any event it processes them in key order (hence no sort) but the lookups on the other columns are expensive. You are only going to solve this by indexing. How selective is each predicate? – Martin Smith Aug 26 '15 at 17:36
  • Sorry, I forgot to specify: 4% is on `INNO`, 96% is on `PDate` – George Aug 26 '15 at 17:44

5 Answers5

3

It seems like you have a misconception about indexes. Indexes don't combine with each other, so it's not a question of having a column "indexed" or "not indexed". It's not good to have a separate index for individual columns. It's about having indexes with several columns that much up with individual queries. An index on a column won't help a query if it's still more efficient for the database to select on another column first.

I'm getting a little stale at this, but for this query I'd recommend an index that looks something like this:

CREATE NONCLUSTERED INDEX [ix_History_XXXXX] ON [History] 
(
    [INNO] ASC,
    [Locked] ASC,
    [PDate] ASC,
    [PMode] ASC
)
INCLUDE ( PStatus, PAmount, Fee)

You may want to swap around PDate, PMode, and PStatus, depending on their selectivity.

When building an index, you want to list the most specific items first. The general idea is that an index stores each successive item in order. With this index, rows for all of the XYZ values for INNO will be grouped together, and so the query engine can seek right to that section of the index. The next most specific column is Locked. Even though this is a bit value, because it is limited to exactly one value we are still able to seek directly to the one specific part of the index that will matter for the entire query. Again: I haven't had to do this kind of thing for a while, so you might do as well listed PMode here; I just don't recall whether the Sql Server query optimizer is smart enough to handle the two values in an efficient way.

From here on out the best option for the index depends on how much each of the query values limits the results. Since we're no longer able to get all of the results into one space, we're gonna have to scan the relevant parts of the index. My instinct here is to use the Date value next. This will allow the scan to walk the index starting with the first date that matches your result, and help it get the records in the correct order, but again: this is just my instinct. You may be able to do better by listing PMode or PStatus first.

Finally, the additional in the INCLUDES clause will allow you to entirely complete this query from the index, without actually going back to the full table. You use an INCLUDES clause rather than just appending the values to the query to avoid making Sql Server rebuild the index for updates to these columns. This is why PStatus, for example, probably should not be part of the main index, if the status is something that can change, and why you might be better off also leaving Locked out of the index. These are things you'll want to measure and test for yourself, though.

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
  • I am not sure I understand half of what you wrote about indexes. Judging by that my SQL is mediocre at best. I was not the one to design indexes, I am not sure who did, might have been at a time when the whole system was small and nobody bothered or was knowledgeable enough to do it right. I cannot change indexes in production directly, but I CAN suggest to change it if I can prove its working in TEST. I will try to apply your indexes in TEST - I will copy table so I have 2 exactly the same tables with same data, then I apply your index to one of the tables and compare execution path. – George Aug 26 '15 at 16:01
  • I've created 2 tables with exactly the same structure and no data. One table has extra index - yours. Execution plan suggests no difference. Do I actually need data in it? I tried to Generate Scripts to replicate the table with data but it errored out, saying its too large or something like that. – George Aug 26 '15 at 17:22
  • A query on a single table with no data will never use an index. – Joel Coehoorn Aug 26 '15 at 17:23
  • OK, ill work on copying data to each table... this may take some time – George Aug 26 '15 at 17:24
  • You don't need all 50million-ish records. Just enough for an index to make a difference. A few thousand should do it. – Joel Coehoorn Aug 26 '15 at 17:37
  • I have 3.5 mils in TEST... copying it now. If it takes too long ill stop and copy less. What would a good number to get stats be? 1000? 10000? – George Aug 26 '15 at 17:41
  • Not sure. Again... I haven't been a developer for 6 years now, so some of this stuff is old for me now. – Joel Coehoorn Aug 26 '15 at 17:55
  • I've injected 10K records, your query was 12% original query 88%. I truncated the tables, injected 100K records - both queries are now 50%. I don't understand why this is happening. – George Aug 26 '15 at 20:47
1

I would see if I got better results with ISNULL instead of COALESCE.

The other thing is looking at the indexes. You listed the fields that are indexed. If those fields are covered by several indexes, I suggest making one good covering index for this query.

A covering index is one where all of the data needed by the query are contained in the index. If an index used by the query is not covering, then there is an extra trip (or trips) to the table to get the rest of the fields. It is more efficient if all of the data is right there in the query.

Check out these articles:

What are Covering Indexes and Covered Queries in SQL Server?

https://www.simple-talk.com/sql/learn-sql-server/using-covering-indexes-to-improve-query-performance/

For the data that is not part of a join or in the where clause, you can use the include keyword. Included fields are not searchable parts of the index, but will save the trip to the database.

Try the the index below. All of the fields in the where clause are part of the searchable part of the index, and all of the returned fields that are not part of the where clause are included. You might need to play with the order after looking at the execution plan, but I took my best guess.

Create Nonclustered Index Ix_Ncl_History_CoveringBigSelect on History(PDate, PMode, INNO, PStatus, Locked) Include (PAmount, Fee)

Here is an article about included columns.

Community
  • 1
  • 1
Sean
  • 472
  • 2
  • 13
  • NO improvement If I remove COALESCE completely (no checking for ISNULL, just plain remove COALESCE). Each index is a separate one, there are other fields in the table and other indexes too, each independent of each other. I am not entirely sure what you mean `one index or many` and `might try making one good covering index for this query`, can you elaborate please? – George Aug 26 '15 at 14:20
  • You answered my question about one index or many. I was asking if the fields you listed as indexed were all in one index, or were there several different indexes. I will edit my answer with more info on covering indexes. – Sean Aug 27 '15 at 16:10
  • Thanks, your index is almost the same as Joel's. When I tested his index with 10K data, it seemed to be working better, but when i injected even more data, the query was again back to the same performance as the original. I can't figure out why. Is there something wrong with my test case? – George Aug 28 '15 at 14:23
  • Looking at Joes, the biggest different I see is that he has status as an included and I have it right in the index. I would try that. Then, as Joel suggested, play with the order in the index. Look at the execution plan with different configurations. – Sean Aug 28 '15 at 20:32
1

I would simply create index on the following table:

CREATE NONCLUSTERED INDEX idx_History_Locked_PMode_INNO_PStatus_PDate_iPAmount_iFee
    ON dbo.History (Locked, PMode, INNO, PStatus, PDate)
    INCLUDE (PAmount, Fee)
WHERE Locked = 1;                -- This is optional, can reduce index size.

This should improve your current query. All conditions should be met here.

Evaldas Buinauskas
  • 13,739
  • 11
  • 55
  • 107
  • Order within the index matters. It seems odd to list `Locked` first, when I'd guess that particular records will lock and unlock fairly regularly. As a bit, it also has poor selectivity. Best to list items first that are both more specific and more stable. – Joel Coehoorn Aug 25 '15 at 22:04
  • Well, I don't like guessing, but I thought if row gets locked once, it's locked once and forever. I might be wrong. It's quite hard to judge what index would suit best when there's no example data, unfortunately. I agree that your index might work better. Perhaps it would be even better to add filter here: `WHERE Locked = 1`. This should reduce index size. Thanks. – Evaldas Buinauskas Aug 25 '15 at 22:08
  • The locked `bit` is more to indicate the last updated record within a chain of related records. Imagine an apple12345 record that goes from `PLANTED` status to `SEEDLING` to `PLANT` to `HARVESTED`. Every time a record for apple12345 goes into the next stage, a DB trigger sets `LOCKED` = 1 to the latest record, while setting it to 0 to any preceding related record. – George Aug 26 '15 at 14:23
0

You are right, the query looks normal. It's a straight forward query, only with 'AND' clause and no "NOT NULL" constraint or joins or subselect. The conditions are mostly equal (only the date is relational). If the values in the conditions (like 'C', 'P', 1, 'XYZ', 'CONSERVED', etc.) are selective enough, than you (or the DBA) should define some indexes and the optimizer can use it. Ask the DBA to create an appropriate index for the table.

How many result lines are you expected to get? If there are a lot (e.g. >> 10,000), the ORDER BY clause could cost a lot.

dev.null
  • 538
  • 2
  • 7
  • The result is very small, should be under one hundred records. The problem that the table its pulling records from is large (20, 30, 40 million records, I don't know exactly, but its millions) – George Aug 26 '15 at 14:14
0

As you said, I assume you can do nothing to the db, include indexing and structural changes. So what about the client App environment, is it powerful enough to do client side calculation?

If answer is yes, I suggest to move the calculation to the client side:

  • Do not cast data type in the query, cast varchar to decimal consume CPU resources. So get the result directly and do the convert job in your app.
  • For the IO issue, try to remove IN condition because IN essentially is a "OR" condition. So split your query into small pieces use "=" condition and send to your app, use your client app to "Union" them.
cqi
  • 539
  • 3
  • 13
  • Not casting did not seem to make a difference. Same thing with turning IN into a bunch of ORs. I can take calculation to the client, but pulling the fields separately does not improve query. Leaving them varchar and not casting doesn't improve it as well. – George Aug 26 '15 at 14:17