4

LINQ to SQL did a horrible job translating one of my queries, so I rewrote it by hand. The problem is that the rewrite necessarily involves an IN clause, and I cannot for the life of me figure out how to pass a collection to ExecuteQuery for that purpose. The only thing I can come up with, which I've seen suggested on here, is to use string.Format on the entire query string to kluge around it—but that will prevent the query from ever ending up in the query cache.

What's the right way to do this?

NOTE: Please note that I am using raw SQL passed to ExecuteQuery. I said that in the very first sentence. Telling me to use Contains is not helpful, unless you know a way to mix Contains with raw SQL.

Benjamin Pollack
  • 27,594
  • 16
  • 81
  • 105

5 Answers5

7

Table-Valued Parameters

On Cheezburger.com, we often need to pass a list of AssetIDs or UserIDs into a stored procedure or database query.

The bad way: Dynamic SQL

One way to pass this list in was to use dynamic SQL.

 IEnumerable<long> assetIDs = GetAssetIDs();
 var myQuery = "SELECT Name FROM Asset WHERE AssetID IN (" + assetIDs.Join(",") + ")";
 return Config.GetDatabase().ExecEnumerableSql(dr=>dr.GetString("Name"), myQuery);

This is a very bad thing to do:

  1. Dynamic SQL gives attackers a weakness by making SQL injection attacks easier.
    Since we are usually just concatenating numbers together, this is highly unlikely, but if you start concatenating strings together, all it takes is one user to type ';DROP TABLE Asset;SELECT ' and our site is dead.
  2. Stored procedures can't have dynamic SQL, so the query had to be stored in code instead of in the DB schema.
  3. Every time we run this query, the query plan must be recalculated. This can be very expensive for complicated queries.

However, it does have the advantage that no additional decoding is necessary on the DB side, since the AssetIDs are found by the query parser.

The good way: Table-Valued Parameters

SQL Server 2008 adds a new ability: users can define a table-valued database type. Most other types are scalar (they only return one value), but table-valued types can hold multiple values, as long as the values are tabular.

We've defined three types: varchar_array, int_array, and bigint_array.

CREATE TYPE bigint_array AS TABLE (Id bigint NOT NULL PRIMARY KEY)

Both stored procedures and programmatically defined SQL queries can use these table-valued types.

  IEnumerable<long> assetIDs = GetAssetIDs();
  return Config.GetDatabase().ExecEnumerableSql(dr=>dr.GetString("Name"),
      "SELECT Name FROM Asset WHERE AssetID IN (SELECT Id FROM @AssetIDs)", 
      new Parameter("@AssetIDs", assetIDs));

Advantages

  1. Can be used in both stored procedures and programmatic SQL without much effort
  2. Not vulnerable to SQL injection
  3. Cacheable, stable queries
  4. Does not lock the schema table
  5. Not limited to 8k of data
  6. Less work done by both DB server and the Mine apps, since there is no concatenation or decoding of CSV strings.
  7. "typical use" statistics can be derived by the query analyzer, which can lead to even better performance.

Disadvantages

  1. Only works on SQL Server 2008 and above.
  2. Rumors that TVP are prebuffered in their entirety before execution of the query, which means phenomenally large TVPs may be rejected by the server. Further investigation of this rumor is ongoing.

Further reading

This article is a great resource to learn more about TVP.

Community
  • 1
  • 1
Stefan Rusek
  • 4,737
  • 2
  • 28
  • 25
3

If you can't use table-valued parameters, this option is a little faster than the xml option while still allowing you to stay away from dynamic sql: pass the joined list of values as a string parameter, and parse the delimited string back to values in your query. please see this article for instructions on how to do the parsing efficiently.

Alex Weitzer
  • 181
  • 1
  • 12
  • 1
    That is exactly what Cheezburger did until we switched to table-valued parameters. We reached two limitations with the parameterized CSV approach, but as always, YMMV. 1: 8KB limit (not usually a concern, but sometimes it helps to have more space) 2: parsing on the SQL Server side was dragged down under lock contention pressure. – Jacob Krall Mar 21 '12 at 02:22
  • Did you mean parsing of the string or of the query? There have been some advances in the string parsing approach that might allow for better plans, but clearly your site is a lot higher volume than anything I've had [the pleasure of|to] work on. – Alex Weitzer Mar 21 '12 at 18:53
  • I meant parsing the parameter; the query should be easy to parse. The article you linked has a registration wall, so I can't read it. – Jacob Krall Mar 22 '12 at 00:19
2

I have a sneaking suspicion that you're on SQL Server 2005. Table-valued parameters weren't added until 2008, but you can still use the XML data type to pass sets between the client and the server.

Jacob Krall
  • 28,341
  • 6
  • 66
  • 76
0

This works for SQL Server 2005 (and later):

create procedure IGetAListOfValues
   @Ids xml -- This will recevie a List of values
as
begin
    -- You can load then in a temp table or use it as a subquery:
    create table #Ids (Id int);
    INSERT INTO #Ids
    SELECT DISTINCT params.p.value('.','int') 
    FROM @Ids.nodes('/params/p') as params(p);
    ...
end

You have to invoke this procedure with a parameter like this:

exec IGetAListOfValues
@Ids = '<params> <p>1</p> <p>2</p> </params>' -- xml parameter

The nodes function uses an xPath expression. In this case, it's /params/p and that's way the XML uses <params> as root, and <p> as element.

The value function cast the text inside each p element to int, but you can use it with other data types easily. In this sample there is a DISTINCT to avoid repeated values, but, of course, you can remove it depending on what you want to achieve.

I have an auxiliary (extension) method that converts an IEnumerable<T> in a string that looks like the one shown in the execute example. It's easy to create one, and have it do the work for you whenever you need it. (You have to test the data type of T and convert to an adequate string that can be parsed on SQL Server side). This way your C# code is cleaner and your SPs follow the same pattern to receive the parameters (you can pass in as many lists as needed).

One advantage is that you don't need to make anything special in your database for it to work.

Of course, you don't need to create a temp table as it's done in my example, but you can use the query directly as a subquery inside an IN predicate

    WHERE MyTableId IN (SELECT DISTINCT params.p.value('.','int') 
    FROM @Ids.nodes('/params/p') as params(p) )
JotaBe
  • 38,030
  • 8
  • 98
  • 117
0

I am not 100% sure that I understand correctly the problem, but LinqToSql's ExecuteQuery has an overload for parameters, and the query is supposed to use a format similar to string.Format.

Using this overload is safe against SQL injection, and behind the scenes LinqToSql transalets it to use sp_executesql with parameters.

Here is an example:

string sql = "SELECT * FROM city WHERE city LIKE {0}";
db.ExecuteQuery(sql, "Lon%"); //Note that we don't need the single quotes 

This way one can use the benefit of parameterized queries, even while using dynamic sql.

However when it comes to using IN with a dynamic number of parameters, there are two options:

  1. Construct the string dynamically, and then pass the values as an array, as in:

    string sql = "SELECT * FROM city WHERE zip IN (";
    List<string> placeholders = new List<string>();
    for(int i = 0; i < zips.Length;i++)
    {
          placeholders.Add("{"+i.ToString()+"}");
    }
    sql += string.Join(",",placeholders.ToArray());
    sql += ")";
    db.ExecuteQuery(sql, zips.ToArray());
    
  2. We can use a more compact approach by using the Linq extension methods, as in

    string sql = "SELECT * FROM city WHERE zip IN ("+
      string.Join("," , zips.Select(z => "{" + zips.IndexOf(f).ToString() + "}"))
    +")";
    db.ExecuteQuery(sql, zips.ToArray());
    
yoel halb
  • 12,188
  • 3
  • 57
  • 52