0

Consider 2 queries which need to return all columns:

Query #1

select * 
from TA 
where id in (select id from TB where filterColumn = ?) 

Query #2

select * 
from TA 
where id in (?, ?, ? ... n)

TA contains millions of records and subquery on TB returns 1-2000 records (but generally, just few hundreds).

Say, result of these queries would be 10k records.

For the 1st query, there would be 1 clustered index scan for all records.

For the 2nd - 10k key lookups. Also, if there are more than 2.1k parameters - there would be multiple queries (rare case).

In terms of general performance and simultaneous insert/update/delete statements, which of the queries is preferred?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
vlaku
  • 1,534
  • 2
  • 14
  • 27
  • 3
    What do the queries say? What ran faster? What did the query plans show? This is also probably better asked on [DBA](http://dba.stackexchange.com). – Thom A Jun 10 '19 at 10:34
  • For large numbers of ID, either in the literal list or returned from the subquery, then the subquery version might run faster because it can be tuned via an index. – Tim Biegeleisen Jun 10 '19 at 10:35
  • 2
    *If you have two horses and you want to know which of the two is the faster then **race your horses*** - see [Which is faster?](http://ericlippert.com/2012/12/17/performance-rant/) by Eric Lippert for more background – marc_s Jun 10 '19 at 10:36
  • *Larnu* running time is more or less same, but there is difference in cpu time (10x lower for valued query) and parsing time (x times higher for valued query). What I seek here is less wait times on this table due to less locks in case of lookups. *TimBiegeleisen* index won't really help as I need all the columns in result. SQL server will choose to use scan in this case over index. – vlaku Jun 10 '19 at 11:16

1 Answers1

0

and subquery on TB returns 1-2000 records

...

In terms of general performance and simultaneous insert/update/delete statements, which of the queries is preferred?

The first approach, because:

  1. Parsing thousands or even hundreds of literals can cost extra resources of the query optimizer.
  2. Query plans are not going to be reused if IN values are changed, therefore recompilations
  3. Microsoft warns about this in the documentation:

Explicitly including an extremely large number of values (many thousands of values separated by commas) within the parentheses, in an IN clause can consume resources and return errors 8623 or 8632. To work around this problem, store the items in the IN list in a table, and use a SELECT subquery within an IN clause.

Error 8623:

The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.

Error 8632:

Internal error: An expression services limit has been reached. Please look for potentially complex expressions in your query, and try to simplify them.

References:

Alexander Volok
  • 5,630
  • 3
  • 17
  • 33
  • Yes, I'm aware about these limitations. But as I said, in 90% of the cases there would be few hundreds parameters at most. I'm ready to trade a few (mili)seconds for less blockings. Also, I wrote that 1-2000 is a result of filtering, but the table itself is bigger (few millions records). Do you still think it's better? – vlaku Jun 10 '19 at 12:30
  • @vlaku, according to your comments, the main goal is to avoid locks on TB during (long running) clustered index scan on TA. If I understand this correctly, perhaps **memory-optimized** table variables (or parameters) is something that will work better than "if there are more than 2.1k parameters - there would be multiple queries (rare case)." Another thing: key lookups on TB will be gone after defining a covered index like `create index ix1 on TB (filterColumn) INCLUDE(id)` – Alexander Volok Jun 10 '19 at 13:11
  • TB already has index, key lookups only occur on TA when parameters are used (2nd query). Scan and blocking queries is on TA. I don't have much control over these queries in fact (as I have multi-db targeted application and use orm for them). So I need to know for my described case should I chose 1st or 2nd query. – vlaku Jun 12 '19 at 06:26