SELECT with WHERE drops performance

Question

In my application I use queries like

SELECT column1, column2 
FROM table 
WHERE myDate >= date1 AND myDate <= date2

My application crashes and returns a timeout exception. I copy the query and run it in SSMS. The results pane displays ~ 40 seconds of execution time. Then I remove the WHERE part of the query and run. This time, the returned rows appear immediately in the results table, although the query continues to print more rows (there are 5 million rows in the table).

My question is: how can a WHERE clause affect query performance?

Note: I don't change the CommandTimeOut property in the application. Left by default.

You most likely need an index on that column, otherwise MSSQL has to search every record in your database in order to find the relevant data. You can inspect the query plan to see the type of query, see https://stackoverflow.com/a/7359705/507793 on how to get the query plan. — Matthew, Oct 18 '21 at 17:13
What's the default timeout? 30 sec? Highly recommend changing that to at least a couple minutes. Obviously, add an index on that column too but it's highly unlikely all your queries will execute within the timeout no matter how performant they are — Radagast, Oct 18 '21 at 17:19
Please share the query plan via https;//berntozar.com/pastetheplan. please also add table and index definitions to your question. Otherwise we cannot answer this — Charlieface, Oct 18 '21 at 22:11
@PhilCoulson There is no reason why even millions of rows cannot be returned in a few seconds if the indexing is good — Charlieface, Oct 18 '21 at 22:12
@Charlieface Yes that's true, but my suggestion was more general. Datasets can be much much bigger and one might not always have the permissions to tinker with indexes (a lot of data analysts start their career with read only permissions). Or there could be huge tables that aren't indexed on purpose to optimize for faster inserts. To me, 30 seconds seemed too short for a query timeout — Radagast, Oct 18 '21 at 22:24
@PhilCoulson Still, the answer to "we're spending too much money" can't always be "well increase the budget!" :-) — Aaron Bertrand, Oct 19 '21 at 01:57

score 1 · Answer 1 · answered Oct 18 '21 at 18:29

Without a WHERE clause, SQL Server is told to just start returning rows, so that's what it does, starting from the first row it can find efficiently (which may be the "first" row in the clustered index, or in a covering non-clustered index).

When you limit it with a where clause, SQL Server first has to go find those rows. That's what you're waiting on, because you don't have an index on myDate (or date1/date2, which I'm not sure are columns or variables), it needs to examine every single row.

Another way to look at it is to think of a phone book, which is an outdated analogy but gets the job done. Without a WHERE clause, it's like you're asking me to read you off all of the names and numbers in the book. If you add a WHERE clause that is not supported by an index, like read me off the names and numbers of every person with the first name 'John', it's going to take me a lot longer to start returning rows because I can't even start until I find the first John.

Or a slightly different analogy is to think of the index in a book. If you ask me to read off the page numbers for all the terms that are indexed, I can do that from the index, just starting from the beginning and reading through until the end. If you ask me to read off all the page numbers for all the terms that aren't in the index, or a specific unindexed term (like "the"), or even all the page numbers for indexed terms that contain the letter a, I'm going to have a much harder time.

SELECT with WHERE drops performance

1 Answers1