2

My problem is simple in nature- query a database, and write output to a set of files (~1000).

I have a SQL query I need to execute from C# on SQL Server. The query is fairly simple except for one of the values in an IN clause.

As an example, the query may look like:

SELECT 
    ID, Name
FROM
    MyTable
WHERE
    SomeValue IN (/* list of values */)

The list of values is around 300 strings that are 6 characters each. The values will ultimately look something like ("A00001", "A00002", etc), and are specified as input by the user of the application.

My initial thought was to build the query out as a long string, with each of the values aggregated together. This doesn't seem like the proper approach however, and seems to be error prone. My next thought was to use a stored procedure, but I'm unsure of how the data should be passed into it. How should this query be built in an efficient way where the data can be reliably passed in?

My other option is to read in ALL data from SQL Server and then filter the results in C# as I receive them. If I do this, however, I'll get around 9 million records which seems to be too many to read into memory as a data set. Because I'm writing files after parsing ~5000 records, I don't want to keep a data reader connected to the database while I chew through the data I'm receiving.

What is the best approach?

John Saunders
  • 160,644
  • 26
  • 247
  • 397
jdylanmc
  • 829
  • 11
  • 24
  • If your data is coming from user input you're likely going to want to carefully sanitize your input to avoid SQL injection attacks. You may be better off using bind variables or temp tables for such a query, as both are effective measures against typical SQL injections. – LBushkin Jan 30 '12 at 19:07
  • What do yo uuse for database interaction? BOTH - "I put together my query as a string" and "I use LINQ because i know what to do" allow that efficiently given an array of input types. Getting application input into this form is a homework in string manipulation for a first year IT student. So, what please IS the problem? – TomTom Jan 30 '12 at 19:20
  • @TomTom The problem boils down to that I'm generating financial reports and the input is too difficult for SSRS to handle. Ontop of that, the output is a series of excel files. The input parameters may be thought of as a list of barcodes that the user wants to reports generated for. Providing you with the actual queries and sample input would add too much unneeded complexity to the question. I already know how to write queries and execute them, what I want to know is how I should efficiently handle a ~10000 character string as a query, or pass a 6000 char string to a stored procedure. – jdylanmc Jan 30 '12 at 19:56

6 Answers6

2

The usual way is to make the IN clause a subquery:

SELECT ProductID from Product where ManufacturerID in 
    (SELECT ManufacturerID from Manufacturer where HomeOfficeLocation = 'France')

If you can't do that: as long as the user is picking from a list you've provided, a simple concatenated list isn't particularly error-prone.

You definitely don't want to loop through a large dataset looking for a small number of items.

Edited to add

To be a bit more complete: what I often do is create a comma-delimited list in the front end (ex. "234, 345, 456") and pass that to a stored procedure as a string; then inside that proc use a user-defined function such as the one discussed here.

I have nothing against temp tables, but usually I only use them if I'm going to do more than one operation on a list.

The reason I said "as long as the user is picking from a list" is to save you having to validate (and sanitize) user input.

Community
  • 1
  • 1
egrunin
  • 24,650
  • 8
  • 50
  • 93
  • He would need to create a table for this though, since the values are passed from the app – JNK Jan 30 '12 at 19:05
  • This was my initial thought but I was unsure of whether this was the correct method to use. It just feels... dirty. – jdylanmc Jan 30 '12 at 19:57
  • I completely agree that this is one of the seemingly unavoidable rough edges of classic SQL programming. – egrunin Jan 30 '12 at 20:01
2

The topic is discussed at length in this series of articles:

You should start with the approach recommended in the first article, using a Table Value Parameter. See also the MSDN usage article: Table-Valued Parameters in SQL Server 2008 (ADO.NET):

void SelectValuesOfInterest(IEnumerable<SqlDataRecord> valuesOfInterest)
SqlCommand cmd = new SqlCommand(@"    SELECT 
        ID, Name
    FROM
        MyTable
    WHERE
        SomeValue IN (SELECT value FROM @tableValueParameter);"
, connection);
SqlParameter tvpParam = insertCommand.Parameters.AddWithValue(
    "@tableValueParameter", valuesOfInterest);
tvpParam.SqlDbType = SqlDbType.Structured;
tvpParam.TypeName = "dbo.MyTVPType";
using (SqlDataReader rdr = cmd.ExecuteReader ())
{
 while rdr.Read() {...}
}
}
Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569
1

If the number of items in your IN clause is large enough, you probably want to use a temporary table - insert the values there first then join to the temp table, or use it as a subquery source for your SELECT.

This CodeProject article discusses the use of temp tables; but to summarize for your case:

CREATE TABLE #TempQueryList(Item varchar(300))

// in your app code:
insert into #TempQueryList values ( @item ); -- use BIND variable
insert into #TempQueryList values ( @item ); -- use BIND variable
...
insert into #TempQueryList values ( @item ); -- use BIND variable

// then your query becomes:
SELECT 
  ID, Name
FROM
  MyTable
WHERE
  SomeValue IN (SELECT Item FROM #TempQueryList)
LBushkin
  • 129,300
  • 32
  • 216
  • 265
1

If you can do it with a subquery (as egrunin showed), that's easiest and fastest. If you can't do that (ie, all you've got is a list of values), then you might want to look at a table value parameter.

This question has some reading that might be of interest.

Community
  • 1
  • 1
D. Lambert
  • 1,304
  • 6
  • 12
1

Here is one way to do it using xml:

C#:

StringBuilder sb = new StringBuilder();
sb.AppendLine("<SelectedValues>");

foreach (string s in userSelectedValues)
  sb.AppendFormat("<row val=\"{0}\" />{1}", s, Environment.NewLine);
sb.AppendLine("</SelectedValues>");

Pass this sb.ToString() AS XML TO your stored procedure

SQL:

CREATE PROCEDURE dbo.MyStoredProcedure
(
  @UserSelectedValuesXml XML
)     

SELECT * 
FROM dbo.MyTable
WHERE Id IN ( SELECT ref.value('@val', 'varchar')
              FROM @UserSelectedValuesXml.nodes('/SelectedValues/row') AS T(ref))
Bassam Mehanni
  • 14,796
  • 2
  • 33
  • 41
  • Would this be faster than a table value parameter? – jdylanmc Jan 30 '12 at 20:02
  • @Joe to send the list to ur stored procedure you will need to send it as XML or a big VARCHAR, you could take the xml and dump it in a table value parameter but this elements the need for this step – Bassam Mehanni Jan 30 '12 at 20:06
  • I met with an engineer and he suggested the XML method over the table value parameters. Thanks! – jdylanmc Jan 30 '12 at 20:57
  • @BassamMehanni: With table value parameters you don't need XML at all, you just send a collection of objects. Not sure what you are meaning here. Also XML processing isn't exactly the best inside SQL server. – NotMe Jan 30 '12 at 21:01
  • @ChrisLively my bad I was thinking Table variables, is it worth while to create those strongly typed table structures vs xml, how much performance boost to you get vs xml, I am only asking because I never used them myself, I always used XML instead because it's easier to construct and consume, thanks – Bassam Mehanni Jan 30 '12 at 21:48
  • @BassamMehanni: We use them extensively in queries where we need to limit by a list. Here's one guy that did a little research: http://www.adathedev.co.uk/2010/02/sql-server-2008-table-valued-parameters.html you might also read: http://www.sommarskog.se/arrays-in-sql-2008.html#Performance_Considerations – NotMe Jan 30 '12 at 22:25
1

I'd use a table value parameter. Depending on the version of SQL server you have access to you can pass an entire datatable into a stored procedure then query off of it just like you would a normal table.

Basically, you'd create a table in C# that contains your list. You'd pass that as a parameter to a stored procedure. In the proc you could just do an inner join between MyTable and the parameter you passed to get the results you want.

NotMe
  • 87,343
  • 27
  • 171
  • 245
  • This seems like the correct method to use. Thanks for pointing me in that direction, I hadn't thought of this! – jdylanmc Jan 30 '12 at 20:01