12

I have a List containing ids that I want to insert into a temp table using Dapper in order to avoid the SQL limit on parameters in the 'IN' clause.

So currently my code looks like this:

public IList<int> LoadAnimalTypeIdsFromAnimalIds(IList<int> animalIds)
{
    using (var db = new SqlConnection(this.connectionString))
    {
        return db.Query<int>(
            @"SELECT a.animalID        
            FROM
            dbo.animalTypes [at]
            INNER JOIN animals [a] on a.animalTypeId = at.animalTypeId
            INNER JOIN edibleAnimals e on e.animalID = a.animalID
            WHERE
            at.animalId in @animalIds", new { animalIds }).ToList();
    }
}

The problem I need to solve is that when there are more than 2100 ids in the animalIds list then I get a SQL error "The incoming request has too many parameters. The server supports a maximum of 2100 parameters".

So now I would like to create a temp table populated with the animalIds passed into the method. Then I can join the animals table on the temp table and avoid having a huge "IN" clause.

I have tried various combinations of syntax but not got anywhere. This is where I am now:

public IList<int> LoadAnimalTypeIdsFromAnimalIds(IList<int> animalIds)
{
    using (var db = new SqlConnection(this.connectionString))
    {
        db.Execute(@"SELECT INTO #tempAnmialIds @animalIds");

        return db.Query<int>(
            @"SELECT a.animalID        
            FROM
            dbo.animalTypes [at]
            INNER JOIN animals [a] on a.animalTypeId = at.animalTypeId
            INNER JOIN edibleAnimals e on e.animalID = a.animalID
            INNER JOIN #tempAnmialIds tmp on tmp.animalID = a.animalID).ToList();
    }
}

I can't get the SELECT INTO working with the list of IDs. Am I going about this the wrong way maybe there is a better way to avoid the "IN" clause limit.

I do have a backup solution in that I can split the incoming list of animalIDs into blocks of 1000 but I've read that the large "IN" clause sufferes a performance hit and joining a temp table will be more efficient and it also means I don;t need extra 'splitting' code to batch up the ids in to blocks of 1000.

Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
Chris B
  • 5,311
  • 11
  • 45
  • 57
  • Where is the `INSERT INTO` statement? – Panagiotis Kanavos Sep 20 '16 at 12:06
  • 2100 values is a very big number. Big enough that they should be stored in their *own* table with proper indexing. Where did these values come from? Are they the result of a query? Why not include in the SELECT statement then? Or did they come from an external source, eg a CSV file? Then the best option is to import the file into a staging table, then join with that table – Panagiotis Kanavos Sep 21 '16 at 07:01
  • They come from a CSV – Chris B Sep 21 '16 at 08:46
  • Then import them into a table and join with it, eg using `bcp`, `BULK INSERT` or SSIS. You could also use the [SqlBulkCopy](https://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy(v=vs.110).aspx) class to send rows from the client to the server in bulk. – Panagiotis Kanavos Sep 21 '16 at 08:49

3 Answers3

8

Ok, here's the version you want. I'm adding this as a separate answer, as my first answer using SP/TVP utilizes a different concept.

public IList<int> LoadAnimalTypeIdsFromAnimalIds(IList<int> animalIds)
{
  using (var db = new SqlConnection(this.connectionString))
  {
    // This Open() call is vital! If you don't open the connection, Dapper will
    // open/close it automagically, which means that you'll loose the created
    // temp table directly after the statement completes.
    db.Open();

    // This temp table is created having a primary key. So make sure you don't pass
    // any duplicate IDs
    db.Execute("CREATE TABLE #tempAnimalIds(animalId int not null primary key);");
    while (animalIds.Any())
    {
      // Build the statements to insert the Ids. For this, we need to split animalIDs
      // into chunks of 1000, as this flavour of INSERT INTO is limited to 1000 values
      // at a time.
      var ids2Insert = animalIds.Take(1000);
      animalIds = animalIds.Skip(1000).ToList();

      StringBuilder stmt = new StringBuilder("INSERT INTO #tempAnimalIds VALUES (");
      stmt.Append(string.Join("),(", ids2Insert));
      stmt.Append(");");

      db.Execute(stmt.ToString());
    }

    return db.Query<int>(@"SELECT animalID FROM #tempAnimalIds").ToList();
  }
}

To test:

var ids = LoadAnimalTypeIdsFromAnimalIds(Enumerable.Range(1, 2500).ToList());

You just need to amend your select statement to what it originally was. As I don't have all your tables in my environment, I just selected from the created temp table to prove it works the way it should.

Pitfalls, see comments:

  • Open the connection at the beginning, otherwise the temp table will be gone after dapper automatically closes the connection right after creating the table.
  • This particular flavour of INSERT INTO is limited to 1000 values at a time, so the passed IDs need to be split into chunks accordingly.
  • Don't pass duplicate keys, as the primary key on the temp table will not allow that.

Edit

It seems Dapper supports a set-based operation which will make this work too:

public IList<int> LoadAnimalTypeIdsFromAnimalIdsV2(IList<int> animalIds)
{
  // This creates an IEnumerable of an anonymous type containing an Id property. This seems
  // to be necessary to be able to grab the Id by it's name via Dapper.
  var namedIDs = animalIds.Select(i => new {Id = i});
  using (var db = new SqlConnection(this.connectionString))
  {
    // This is vital! If you don't open the connection, Dapper will open/close it
    // automagically, which means that you'll loose the created temp table directly
    // after the statement completes.
    db.Open();

    // This temp table is created having a primary key. So make sure you don't pass
    // any duplicate IDs
    db.Execute("CREATE TABLE #tempAnimalIds(animalId int not null primary key);");

    // Using one of Dapper's convenient features, the INSERT becomes:
    db.Execute("INSERT INTO #tempAnimalIds VALUES(@Id);", namedIDs);

    return db.Query<int>(@"SELECT animalID FROM #tempAnimalIds").ToList();
  }
}

I don't know how well this will perform compared to the previous version (ie. 2500 single inserts instead of three inserts with 1000, 1000, 500 values each). But the doc suggests that it performs better if used together with async, MARS and Pipelining.

takrl
  • 6,356
  • 3
  • 60
  • 69
  • This version will perform a lot better due to indexing and statistics. TVPs have no indexes and the optimizer assumes they contain only one row. The `IN` clause is also faster than TVPs – Panagiotis Kanavos Sep 21 '16 at 07:03
  • @PanagiotisKanavos I always thought that if you create the table type with an index or PK, that would be utilized for the TVP. Is that not the case? – takrl Sep 21 '16 at 07:08
  • [SQL Server doesn't maintain statistics](https://msdn.microsoft.com/en-us/library/bb510489.aspx#Anchor_1) for TVPs which results in bad cardinality estimates, ie the optimizer doesn't know how many unique values there are. This can result in inefficient execution plans even though you have a primary key. >2100 rows are enough that I'd benchmark first. Although, with >2100 rows I'd rather use a proper staging table to import and massage the source data – Panagiotis Kanavos Sep 21 '16 at 07:21
  • Thank you, it's the inserting from a list into a table that I am struggling with. I can see now how I should do it. Thanks for your help. – Chris B Sep 21 '16 at 08:45
  • 1
    FYI, if you use OPTION(RECOMPILE) or trace flag 2453, the proper cardinality estimates will be used for table variables. See for example: https://sqlperformance.com/2014/06/t-sql-queries/table-variable-perf-fix – Mark Sowul Jul 03 '18 at 21:26
2

In your example, what I can't see is how your list of animalIds is actually passed to the query to be inserted into the #tempAnimalIDs table.

There is a way to do it without using a temp table, utilizing a stored procedure with a table value parameter.

SQL:

CREATE TYPE [dbo].[udtKeys] AS TABLE([i] [int] NOT NULL)
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[myProc](@data as dbo.udtKeys readonly)AS
BEGIN
    select i from @data;
END
GO

This will create a user defined table type called udtKeys which contains just one int column named i, and a stored procedure that expects a parameter of that type. The proc does nothing else but to select the IDs you passed, but you can of course join other tables to it. For a hint regarding the syntax, see here.

C#:

var dataTable = new DataTable();
dataTable.Columns.Add("i", typeof(int));
foreach (var animalId in animalIds)
    dataTable.Rows.Add(animalId);
using(SqlConnection conn = new SqlConnection("connectionString goes here"))
{
    var r=conn.Query("myProc", new {data=dataTable},commandType: CommandType.StoredProcedure);
    // r contains your results
}

The parameter within the procedure gets populated by passing a DataTable, and that DataTable's structure must match the one of the table type you created.

If you really need to pass more that 2100 values, you may want to consider indexing your table type to increase performance. You can actually give it a primary key if you don't pass any duplicate keys, like this:

CREATE TYPE [dbo].[udtKeys] AS TABLE(
    [i] [int] NOT NULL,
    PRIMARY KEY CLUSTERED 
    (
        [i] ASC
    )WITH (IGNORE_DUP_KEY = OFF)
)
GO

You may also need to assign execute permissions for the type to the database user you execute this with, like so:

GRANT EXEC ON TYPE::[dbo].[udtKeys] TO [User]
GO

See also here and here.

Community
  • 1
  • 1
takrl
  • 6,356
  • 3
  • 60
  • 69
  • Thanks for this, although I'm not really looking to create an SP or make any changes to the database. I'm not entirely sure what those two links you have provided mean, I'm not having an issue with execute permissions. The main problem is forming a way of inserting the list of plain INT values into a table using Dapper – Chris B Sep 20 '16 at 15:19
  • @ChrisB The two links just illustrate the fact that if you go down the SP route, you may need to grant exec privileges on the type you create, which is not obvious if you don't know it. Most people assume that privileges to execute the SP will suffice, which is not the case. – takrl Sep 20 '16 at 16:58
  • Ah ok that is a useful gotcha to know about , thanks – Chris B Sep 21 '16 at 08:15
1

For me, the best way I was able to come up with was turning the list into a comma separated list in C# then using string_split in SQL to insert the data into a temp table. There are probably upper limits to this, but in my case I was only dealing with 6,000 records and it worked really fast.

public IList<int> LoadAnimalTypeIdsFromAnimalIds(IList<int> animalIds)
{
    using (var db = new SqlConnection(this.connectionString))
    {
        return db.Query<int>(
            @"  --Created a temp table to join to later. An index on this would probably be good too.
                CREATE TABLE #tempAnimals (Id INT)
                INSERT INTO #tempAnimals (ID)
                SELECT value FROM string_split(@animalIdStrings)

                SELECT at.animalTypeID        
                FROM dbo.animalTypes [at]
                JOIN animals [a] ON a.animalTypeId = at.animalTypeId
                JOIN #tempAnimals temp ON temp.ID = a.animalID -- <-- added this
                JOIN edibleAnimals e ON e.animalID = a.animalID", 
            new { animalIdStrings = string.Join(",", animalIds) }).ToList();
    }
}

It might be worth noting that string_split is only available in SQL Server 2016 or higher or if using Azure SQL then compatibility mode 130 or higher. https://learn.microsoft.com/en-us/sql/t-sql/functions/string-split-transact-sql?view=sql-server-ver15

wilsjd
  • 2,178
  • 2
  • 23
  • 37
  • This solution worked and got past the INSERT INTO 1000 row limit. Only two issues: 1) there's a missing parameter on string_split where you need to specify the character to split on: `string_split(@animalIdStrings,',')`; and 2) this is not SQL injection safe. – CokoBWare Jul 04 '23 at 18:51
  • Yeah it’s really not very memory efficient either. I ended up going a different direction for these because of that. The sql injection didn’t matter to me because it was just IDs that I was selecting anyway. – wilsjd Jul 04 '23 at 22:59
  • I’ll post a different solution here later that uses XML that we use as well to avoid this. – wilsjd Jul 04 '23 at 23:00
  • I just looked at our XML piece and it really isn't much different than what I am doing above actually. – wilsjd Jul 05 '23 at 20:53