0

I have some list of strings (the number of string varies from 10 to 100) and I need to select values from some large table (100K-5M records) as efficiently as I can. It seems to me that I have 3 options basically - to use 'in' clause, to use table variable or to use temp table. something like this:

select col1, col2, col3, name from my_large_table 
where index_field1 = 'xxx'
and index_field2 = 'yyy'
and name in ('name1', 'name2', 'name3', ... 'nameX')

or

declare @tbl table (name nvarchar(50))
insert @tbl(name) values ('name1', 'name2', 'name3', ... 'nameX')

select col1, col2, col3, name 
from my_large_table inner join @tbl as tbl on (tbl.name = my_large_table.name)
where index_field1 = 'xxx'
and index_field2 = 'yyy'

The large table has clustered index on (index_field1, index_field2, name, index_field3).

Actually for each set of names I have 4-5 queries from the large table: select, then update and/or insert and/or delete according to some logic - each time constraining the query on this set of names.

The name set and the queries are built dynamically in .net client, so there is no problems of readability, code simplicity or similar. The only goal is to reach the best performance, since this batch will be executed a lot of times. So the question is - should I use 'in' clause, table variable or something else to write my condition?

Grisha
  • 713
  • 5
  • 13
  • 1
    Test all three variations & look at the I/O stats & execution plan. Table variables are often the wrong answer as they can easily result in a bad query plan.. – alroc Dec 21 '15 at 13:42
  • Also read about `IN` clause here [Maximum size for a SQL Server Query? IN clause?](http://stackoverflow.com/questions/1869753/maximum-size-for-a-sql-server-query-in-clause-is-there-a-better-approach) – Pரதீப் Dec 21 '15 at 13:46
  • @alroc If I could test it myself on real data, I wouldn't ask. I'd prefer an answer BEFORE I give my program to customers which have these big tables. Of course, I'll make some tuning afterwards as well, but I want to have my first version as best as I can – Grisha Dec 21 '15 at 13:46
  • @Grisha why are you unable to test against data that's representative of your production environment? Fact of the matter is, you can test, you can guess, but until you have a test environment that accurately portrays your production environment, you won't be sure about performance until you get to production. – alroc Dec 21 '15 at 15:00

1 Answers1

0

As already mentioned you should avoid using table variables for pretty large data, as they do not allow indexes (more details here).

If I got it correctly, you have multiple queries using the same set of names, so I would suggest the following approach:

1) create a persistent table (BufferTable) to hold words list: PkId, SessionId, Word.

2) for each session using some set of words: bulk insert your words here (SessionId will be unique to each batch of queries). This should be very fast for tens-hundreds of words.

3) write your queries like the one below:

select col1, col2, col3, name 
from my_large_table LT
   join BufferTable B ON B.SessionId = @SessionId AND name = B.Word
where LT.index_field1 = 'xxx'
and LT.index_field2 = 'yyy'

4) An index on SessionId is name is required for best performance.

This way, you do not have to push the words for every query.

BufferTable is best emptied periodically, as deletes are expensive (truncate it when nobody is doing somehting on it is an option).

Alexei - check Codidact
  • 22,016
  • 16
  • 145
  • 164
  • The list of words is small, so I don't think that I need to bother about indexes on this table. Bulk insert is another query, so there is some additional cost of preparing it ,sending the data to server over the network etc. Do you think that it still will be better than just use 'in' with hard-coded set of words? – Grisha Dec 21 '15 at 14:09
  • Do not allow index **and** do not have statistics. The latter causes bad execution plans. – TT. Dec 21 '15 at 14:13
  • I thought you have several queries that "join" with this list of words, so a little setup won't be a significant overhead. Bulk inserting hundreds of words should take way less than simply selecting millions of rows from a single table. – Alexei - check Codidact Dec 21 '15 at 14:37