2

I am tasked with the following: The customer wants to read a number (possibly thousands) of values from a csv file. I have to use these values in an SQL select statement like this:

SELECT * FROM myTable WHERE myTable.Value IN (cvsValue1, csvValue 2, ..., csvValueN)

The question is: Will this work for an arbitrary number of csv values, and will it perform well for large number of values?

I will need to save the SQL as a string internally in my C# application for later use. (if that makes a difference for alternative solutions)

Code Magician
  • 23,217
  • 7
  • 60
  • 77
Michael
  • 107
  • 2
  • 8
  • 2
    I'm not posting this as an answer, because I'm not sure if it'll be useful in your particular situation, but are you aware of [bulk insert](http://blog.sqlauthority.com/2008/02/06/sql-server-import-csv-file-into-sql-server-using-bulk-insert-load-comma-delimited-file-into-sql-server/) which is a fairly quick and simple way of converting a CSV file to a SQL server table? This talk about saving the string internally for later use: would it be possible to save the data as a table, and then perform joins or subqueries to get the information you want? – David Hedlund Nov 23 '11 at 08:50
  • possible duplicate of [Maximum size for a SQL Server Query? IN clause? Is there a Better Approach](http://stackoverflow.com/questions/1869753/maximum-size-for-a-sql-server-query-in-clause-is-there-a-better-approach) – Martin Smith Nov 23 '11 at 08:56
  • @DavidHedlund: Unfortunately I can't just put the data in a 'real' table - but bulk insert into a temporary table will propably be fine. Thanks. – Michael Nov 23 '11 at 12:58

3 Answers3

2

You really don't want to do that. It would be better to dump those values into an indexed table and use IN as a subquery (which typically implements a SEMI JOIN (more info) vs the array of strings (which is typically implement as a series of OR operations.

from BOL:

Including an extremely large number of values (many thousands) in an IN clause can consume resources and return errors 8623 or 8632. To work around this problem, store the items in the IN list in a table.

Error 8623:

The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.

Error 8632:

Internal error: An expression services limit has been reached. Please look for potentially complex expressions in your query, and try to simplify them.

Code Magician
  • 23,217
  • 7
  • 60
  • 77
2

Performance are really poor with IN operator and a lot of values (sometimes your request doesn't do it and crash).

I do it like this

"

SELECT cast(Items as int) as id INTO #table_temp
FROM dbo.split(@str_ids_comma,',')

SELECT * 
FROM myTable 
WHERE EXISTS(
SELECT *
FROM  #table_temp
WHERE id = myTable.Value
)

" you can find a definition of split here : http://www.sqlservercentral.com/articles/Tally+Table/72993/

remi bourgarel
  • 9,231
  • 4
  • 40
  • 73
  • Your split function is likely to suffer with larger datasets as well. Check out this article for an in-depth look at split methods in the db: http://www.sqlservercentral.com/articles/Tally+Table/72993/ – Code Magician Nov 23 '11 at 09:01
  • The one I posted was just an example, yours might be better, I'll add your link – remi bourgarel Nov 23 '11 at 09:59
  • Well, apart from the split issue, this looks good. I will just dump values into the temp table with batches of insert into. – Michael Nov 23 '11 at 10:20
  • if you have a lot of values, your batch of intert to migh end up in a timeout (above 500 in my case) – remi bourgarel Nov 23 '11 at 10:42
0

I guess that max allowed number of elements in the list for IN operator depends on database. I read somewhere I think, because I had similar problem, that OracleXE has a limitation of 1000 list elements. You should look into this for your specific database.
As for the alternative solution, you could split this query into several batches maybe, where list contains smaller number of elements.

Less
  • 3,047
  • 3
  • 35
  • 46