2

I have a table with the following structure

CREATE TABLE tbl 
(
    id int NOT NULL,
    field_ID int NULL,
    field_VALUE float NULL,
    field_TIMESTAMP datetime NULL,
    field_QUALITY smallint NULL,
    -- more <fields> columns, for each all of the four above will be repeated
    CONSTRAINT pk_id PRIMARY KEY (id)
);

This table structure is the result of an external program and I cannot change it except for some minor changes. Currently the table has roughly 3 million rows.

Now I usually select data for a certain timerange and quality threshold:

SELECT *
FROM tbl
WHERE 
    field_TIMESTAMP >= '2023-04-01 00:00:00' AND
    field_TIMESTAMP < '2023-04-08 00:00:00' AND
    field_QUALITY >= 192

This query takes roughly 3 minutes to run. To solve this issue I created a nonclustered index on the timestamp column which should be my primary filter:

CREATE NONCLUSTERED INDEX tbl_idx ON tbl (field_TIMESTAMP);

Unfortunately the execution plan shows, that the optimiser does not use my index when executing the query and it still runs for ~3 minutes. However, when I manually force it to use the index the query succeeds in 0.3 seconds:

SELECT *
FROM tbl
WITH (INDEX(tbl_idx))
WHERE 
    field_TIMESTAMP >= '2023-04-01 00:00:00' AND
    field_TIMESTAMP < '2023-04-08 00:00:00' AND
    field_QUALITY >= 192

I do not like to force the query compiler on an index, since things might change and it would require manually modifying all queries.

Is there a reason, that SQL Server decides against using the index? I tried updating the table statistics (UPDATE STATISTICS tbl WITH FULLSCAN), but nothing changed. Can I improve my SELECT in some way?

Update

I now have the following indices created

CREATE NONCLUSTERED INDEX tbl_TIMESTAMP_IDX 
ON tbl (field_TIMESTAMP ASC);

CREATE NONCLUSTERED INDEX tbl_VALUE_IDX 
ON tbl (field_TIMESTAMP ASC) 
   INCLUDE (field_VALUE);

CREATE NONCLUSTERED INDEX tbl_quality_idx 
ON tbl (field_QUALITY ASC);

CREATE NONCLUSTERED INDEX tbl_filter_timestamp_idx 
ON tbl (field_TIMESTAMP ASC) 
WHERE ([field_QUALITY] >= (192));

My observations (where clause simplified):

Query:

SELECT field_VALUE, field_TIMESTAMP 
FROM tbl 
WHERE field_TIMESTAMP > 'A' 
  AND field_TIMESTAMP < 'B'

Uses the INCLUDEd Index, but would require to have every column that I need in the index.

These queries do still not use any index, independent whether I have field_QUALITY in the WHERE clause or not.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Timm
  • 202
  • 1
  • 11
  • Just guesses. Did you try adding brackets around the timestamp fields (timestamp AND timestamp) AND quality? What happens if you have the quality field your first WHERE condition? Did you test with a index containing timestamp and quality? – user743414 Apr 25 '23 at 11:08
  • 2
    Your index is not covering; what happens if you add `Field_quality` to the index? You could also consider having the unique clustered index on field_Timestamp, Id. 3 minutes is also extremely long, I can scan a 3m row table *and* return the data to SSMS in ~12 seconds. – Stu Apr 25 '23 at 11:10
  • @Stu, due to the external program I cannot change the clustered index, otherwise I had done so. I will try adding _QUALITY to the index, but since it primarily consists of 0 and 192 I feel an index is not helpful here. – Timm Apr 25 '23 at 11:16
  • The problem, as @Stu mentioned, is that the index isn't covered. No only is `field_QUALITY ` not in the index, but *none* of the other columns in your table are (apart from `id` which is `INCLUDE`d implicitly due to it being the clustered index). This means that the index you created (`tbl_idx`) isn't actually helpful to the instance. To use it it would have to seek to the rows for your wanted values of `field_TIMESTAMP` and *then* perform a key look up on *every single one* of those rows to see if `field_QUALITY` has the value `192` and then get the values of all the other columns. (1/2) – Thom A Apr 25 '23 at 11:20
  • 2
    Doing a key lookup isn't a cheap operation, and so unless we're talking a small volume of rows it's highly likely that SQL Server has determined that *scanning* the **entire** clustered index is actually the less expensive operation. (2/2) – Thom A Apr 25 '23 at 11:21
  • @ThomA: The table has 49 Columns, there is no way I will `INCLUDE` all those in an index. Also, considering I can reduce execution time from 3 minutes to 0.3 seconds using the index apparently _is_ the less expensive operation, the compiler just assumes it isn't. Also, leaving `field_QUALITY` from the `WHERE` clause does not change the execution plan, so I doubt indexing this helps. – Timm Apr 25 '23 at 11:27
  • How many rows have a `field_QUALITY >= 192` and how many less, in percentage? Could be best to use `CREATE NONCLUSTERED INDEX tbl_idx ON tbl (field_TIMESTAMP) WHERE (field_QUALITY >= 192);` or similar – Charlieface Apr 25 '23 at 11:27
  • Then do you *need* to return all the columns, @Timm ? – Thom A Apr 25 '23 at 11:27
  • @ThomA a very wide non-clustered index is often overkill, clearly key lookups are working here, just need to persuade the compiler it's the best option. – Charlieface Apr 25 '23 at 11:29
  • @ThomA, no, I don't but as the execution shows the seek for the rows is not the time limiting factor. And more than 99%+ of all rows should have a `_quality >= 192` – Timm Apr 25 '23 at 11:29
  • I don't disagree, @Charlieface , I was explaining why the data engine has chosen *not* to use a lookup, not that the OP should be creating a wide index. Hence why I [asked](https://stackoverflow.com/questions/76100446/sql-server-does-not-use-non-clustered-index?noredirect=1#comment134209278_76100446) if they need all the columns in the `SELECT`. – Thom A Apr 25 '23 at 11:31
  • So try changing the index to be filtered by `field_QUALITY` as mentioned. Please also share your execution plans via https://brentozar.com/pastetheplan – Charlieface Apr 25 '23 at 11:39
  • 1
    Note that a filtered index will be ignored if the query is parameterised or auto-parameterised on compilation. – Stu Apr 25 '23 at 11:42
  • I updated the question with a few more information and indices. I can have a look if I can share execution plans, but I would have to heavily redact them, which is not that easy. – Timm Apr 25 '23 at 12:13
  • 1
    How many rows are likely to be covered by the date range selection? You might consider refactoring your query to first populate a temp table with the `id` values in the date range and then join that to your table to return required columns. You could also experiment with a derived table and rowgoal. – Stu Apr 25 '23 at 12:17
  • @Stu: 40000 Rows for that one week – Timm Apr 25 '23 at 12:22
  • If you are not sure why a more inefficient query plan is chosen you should also double check that the statistics are up to date. You can force all the statistics on the table to be updated and then see if that makes a difference. If the query plan now is more efficient with the same query than perhaps your statistics are not being updated, check to make sure you have a maintenance plan to do this and also check your DB properties that auto update statistics is enabled. – Igor Apr 25 '23 at 13:34
  • 1
    *If you are passing in the values as parameters via an application* then another reason a more inefficient plan is chosen could be caused by parameter sniffing. To check if this is the case you can add `OPTION (RECOMPILE)`, if the expected query plan is now used then this could very well be the problem. See also this answer which describes parameter sniffing as a cause to a poorly selected query plan and also other possible reasons: https://stackoverflow.com/a/23942462/1260204 – Igor Apr 25 '23 at 13:37
  • 1
    Note: if an index is very fragmented (say 50%) SQL Server might decide a scan is more optimal as a plan. – Mark Schultheiss Apr 25 '23 at 15:32

0 Answers0