9

When implementing table-valued parameters, one of the most common ways to generate an IEnumerable<SqlDataRecord> for use by the parameter is code like this (e.g., https://stackoverflow.com/a/10779567/18192 ):

public static IEnumerable<SqlDataRecord> Rows(List<int> simpletable)
{
    var smd = new []{ new SqlMetaData("id", SqlDbType.Int)};
    var sqlRow = new SqlDataRecord(smd);
    foreach (int i in simpletable)
    {
        sqlRow.SetInt32(0, i);
        yield return sqlRow;
    }
}
//...
var param = sqlCmd.Parameters.AddWithValue("@retailerIDs", Rows(mydata)); 
param.SqlDbType = SqlDbType.Structured;
param.TypeName = "myTypeName";

This code does seem to work. While reusing SqlMetaData does not set off too many alarm bells, declaring the SqlDataRecord outside the foreach loop feels incredibly suspicious to me:

A mutable object is modified and then yielded repeatedly.

As an example of why this is concerning, calling var x = Rows(new[] { 100, 200}.ToList()).ToList().Dump() in LinqPad spits out 200,200. This approach seems to rely on an implementation detail (that rows are processed individually), but I don't see any documentation which promises this.

Is there some mitigating factor which renders this approach safe?

Community
  • 1
  • 1
Brian
  • 25,523
  • 18
  • 82
  • 173
  • At first sight it does look suspicious. SqlDataRecord inherits from System.Object, so it is a reference type. If this were a normal foreach loop, you would be returning the same object repeatedly. However I think the yield keyword changes things. The method Rows is called once for each item, so a new SqlDataRecord is instantiated for each row. See MSDN > yield (C# Reference): https://msdn.microsoft.com/en-us/library/9k7k7cf0.aspx – RichardCL May 03 '16 at 16:17
  • 2
    @RichardCL: SqlDataRecord is a reference type. Thus, `yield return` is spitting out a copy of the reference, but that reference is pointing to the same object each time. Hence why my example snippet (`var x = Rows(new[] { 100, 200}.ToList()).ToList().Dump()`) spits out `200,200` rather than `100,200`. The only reason the code works in the case of a `SqlCommand` is that the each row is processed individually...but I believe that is an implementation detail, rather than something documented. – Brian May 03 '16 at 16:33
  • There is no way to make this code safe, unless it run on *only* one thread, and the return value was *never* used outside its enumerator - never converted to a list, or array or whatever. This would make it useless eg for batch operations, display on a Grid etc. – Panagiotis Kanavos May 17 '16 at 14:48
  • 1
    @PanagiotisKanavos: As I state, this code is being used as a table-valued SP parameter, so none of those concerns apply, excepting in so far as they are violated by implementation details of the various `SqlCommand.Execute` calls. – Brian May 17 '16 at 19:14
  • 1
    @Brian you should phrase this in the opposite way: the only way this code can work is if `Execute` is coded in a *very* specific way and no other code results in an enumeration of the iterator, eg no `ToList` or `ToArray` calls. Otherwise you'd pass an array of X references to the same object. Stated like this it's obvious that this code is not safe. – Panagiotis Kanavos May 18 '16 at 07:13

3 Answers3

4

This approach seems to rely on an implementation detail (that rows are processed individually), but I don't see any documentation which promises this.

Is there some mitigating factor which renders this approach safe?

As user1249190 points out, Reusing SQLDataRecord is explicitly recommended in the remarks section of https://learn.microsoft.com/en-us/dotnet/api/microsoft.sqlserver.server.sqldatarecord#remarks :

This class is used together with SqlPipe to send result sets to the client from managed code stored-procedures. When writing common language runtime (CLR) applications, you should re-use existing SqlDataRecord objects instead of creating new ones every time. Creating many new SqlDataRecord objects could severely deplete memory and adversely affect performance.

Obviously this recommendation does not apply to usage across threads: The documentation also explicitly warns that "Any instance members are not guaranteed to be thread safe."

Kols
  • 3,641
  • 2
  • 34
  • 42
Brian
  • 25,523
  • 18
  • 82
  • 173
  • That comment is specific to using SqlPipe in a CLR stored procedures. It seems the SqlDataRecord docs were written when that was the main/only use case. Now they're used in tvp parameters from the other end too, and the same logic may apply... or not. As Brian says in the comments a couple of answers below, the question is really how the consumer plans to use the IEnumerable - eager or lazy? In terms of sending tvps, the fact the referenced example works at all is an indication tvp evaluation is lazy – user1664043 Sep 09 '21 at 17:31
1

If you don't need it outside of the foreach loop at all, I don't see why you would want to re-use it.

I found this question Is there a reason for C#'s reuse of the variable in a foreach? which links to this answer in another question Is it better coding practice to define variables outside a foreach even though more verbose? where Jon Skeet answered saying:

There's no advantage to declaring the variables outside the loop, unless you want to maintain their values between iterations.

(Note that usually this makes no behavioural difference, but that's not true if the variables are being captured by a lambda expression or anonymous method.)

Community
  • 1
  • 1
Zack
  • 2,789
  • 33
  • 60
  • 2
    Reusing it saves on allocation -- which we care about in this case because, if we didn't, we'd be using `DataTable`. As to how significant the saving is in this case is another matter, since the objects are all short lived and should be easy fodder for the GC. – Jeroen Mostert May 17 '16 at 14:38
  • 1
    @JeroenMostert modifying the same shared instance inside a loop only ensures the callers will get the wrong data. If performance is an issue, one should use SqlBulkCopy, *not* TVPs and SqlDataRecord. There is a huge difference in speed due to the use of batch operations, minimal logging, etc. In any case, trying to save allocations is pointless - this would only work if ADO.NET itself *copied* the data during iteration and no other code tried to enumerate the iterator. – Panagiotis Kanavos May 17 '16 at 14:54
  • @PanagiotisKanavos: In practice, the callers end up getting the right data, since callers process rows individually. Of course, relying on that is dirty. As for the performance benefits, TVPs are faster than bulk inserts if dealing in smaller sets of rows. See Table-Valued Parameters vs. BULK INSERT Operations on [MSDN: Use Table-Valued Parameters](https://msdn.microsoft.com/en-us/library/bb510489.aspx) – Brian May 17 '16 at 19:06
  • @Brian that "in practice" actually means "in very specific circumstances". Your own example shows how this can fail. As for performance benefits - it the rows are so few, why do you care about reusing a SqlDataRecord? – Panagiotis Kanavos May 18 '16 at 07:19
  • @PanagiotisKanavos: I'm not sure it's fair to call it "very specific." I think it's pretty rare for application code to instantiate a SqlDataRecord for anything other than a table-valued parameter. In general, a well-written library which consumes `IENumerable` is unlikely to use eager evaluation, if only because doing so not only forces an extra iteration over the SqlDataRecords but also because lazy evaluation has constant memory use. Relying on implementation details is bad, but it's unlikely to actually break (and if it does, it will probably break immediately). – Brian May 18 '16 at 12:50
1
void Main()
{
    //This code proves that the object is being modified.
   Thing prevRow = null;
    foreach (var curRow in Rows(new List<int>() { 1, 2, 3 }))
    {
        Console.WriteLine(curRow);
        Console.WriteLine(prevRow);
        prevRow = curRow;
    }
    
    //Because the object is modified instead of a new object being returned,
    // this code does something unexpected; it returns the same object 3
    // times! Instead of three unique objects representing the values 1, 2, 3.
    var rowsAsList = Rows(new List<int>() { 1, 2, 3 }).ToList();
    foreach (var curRow in rowsAsList)
    {
        Console.WriteLine(curRow);
    }
}

public class Thing
{
    public int i;
}

IEnumerable<Thing> Rows(List<int> simpletable)
{
    // Bad: Reusing the object will cause problems. Comment out the next line to fix the bug.
    var sqlRow = new Thing() {i=-1};
    
    foreach (int x in simpletable)
    {
        // Good: Do this instead! Uncomment the following line to fix the bug.
        // var sqlRow = new Thing() {i=x};
        sqlRow.i = x;
        yield return sqlRow;
    }
}
Greg Dietsche
  • 862
  • 8
  • 16
  • This demo is largely identical to my example, "As an example of why this is concerning, calling `var x = Rows(new[] { 100, 200}.ToList()).ToList().Dump()` in LinqPad spits out 200,200. " – Brian May 17 '16 at 19:08
  • That's exactly what I was trying to show you and is why the code should yield return a new Thing instead of re-using the same "thing" over and over :) – Greg Dietsche May 17 '16 at 21:31
  • 1
    But MSDN said to reuse the object in remarks?? https://msdn.microsoft.com/en-us/library/microsoft.sqlserver.server.sqldatarecord.aspx – user1249190 Aug 13 '17 at 14:14