2

I'm trying to insert records using a high performance table parameter method ( http://www.altdevblogaday.com/2012/05/16/sql-server-high-performance-inserts/ ), and I'm curious if it's possible to retrieve back the identity values for each record I insert.

At the moment, the answer appears to be no - I insert the data, then retrieve back the identity values, and they don't match. Specifically, they don't match about 75% of the time, and they don't match in unpredictable ways. Here's some code that replicates this issue:

// Create a datatable with 100k rows
DataTable dt = new DataTable();
dt.Columns.Add(new DataColumn("item_id", typeof(int)));
dt.Columns.Add(new DataColumn("comment", typeof(string)));
for (int i = 0; i < 100000; i++) {
    dt.Rows.Add(new object[] { 0, i.ToString() });
}

// Insert these records and retrieve back the identity
using (SqlConnection conn = new SqlConnection("Data Source=localhost;Initial Catalog=testdb;Integrated Security=True")) {
    conn.Open();
    using (SqlCommand cmd = new SqlCommand("proc_bulk_insert_test", conn)) {
        cmd.CommandType = CommandType.StoredProcedure;

        // Adding a "structured" parameter allows you to insert tons of data with low overhead
        SqlParameter param = new SqlParameter("@mytable", SqlDbType.Structured);
        param.Value = dt;
        cmd.Parameters.Add(param);
        SqlDataReader dr = cmd.ExecuteReader();

        // Set all the records' identity values
        int i = 0;
        while (dr.Read()) {
            dt.Rows[i].ItemArray = new object[] { dr.GetInt32(0), dt.Rows[i].ItemArray[1] };
            i++;
        }
        dr.Close();
    }

    // Do all the records' ID numbers match what I received back from the database?
    using (SqlCommand cmd = new SqlCommand("SELECT * FROM bulk_insert_test WHERE item_id >= @base_identity ORDER BY item_id ASC", conn)) {
        cmd.Parameters.AddWithValue("@base_identity", (int)dt.Rows[0].ItemArray[0]);
        SqlDataReader dr = cmd.ExecuteReader();
        DataTable dtresult = new DataTable();
        dtresult.Load(dr);
    }
}

The database is defined using this SQL server script:

CREATE TABLE bulk_insert_test (
    item_id int IDENTITY (1, 1) NOT NULL PRIMARY KEY,
    comment varchar(20)
)
GO

CREATE TYPE bulk_insert_table_type AS TABLE ( item_id int, comment varchar(20) )
GO

CREATE PROCEDURE proc_bulk_insert_test
    @mytable bulk_insert_table_type READONLY
AS

DECLARE @TableOfIdentities TABLE (IdentValue INT)

INSERT INTO bulk_insert_test (comment)
OUTPUT Inserted.item_id INTO @TableOfIdentities(IdentValue)
SELECT comment FROM @mytable

SELECT * FROM @TableOfIdentities

Here's the problem: the values returned from proc_bulk_insert_test are not in the same order as the original records were inserted. Therefore, I can't programmatically assign each record the item_id value I received back from the OUTPUT statement.

It seems like the only valid solution is to SELECT back the entire list of records I just inserted, but frankly I'd prefer any solution that would reduce the amount of data piped across my SQL Server's network card. Does anyone have better solutions for large inserts while still retrieving identity values?

EDIT: Let me try clarifying the question a bit more. The problem is that I would like my C# program to learn what identity values SQL Server assigned to the data that I just inserted. The order isn't essential; but I would like to be able to take an arbitrary set of records within C#, insert them using the fast table parameter method, and then assign their auto-generated ID numbers in C# without having to requery the entire table back into memory.

Given that this is an artificial test set, I attempted to condense it into as small of a readable bit of code as possible. Let me describe what methods I have used to resolve this issue:

  1. In my original code, in the application this example came from, I would insert about 15 million rows using 15 million individual insert statements, retrieving back the identity value after each insert. This worked but was slow.
  2. I revised the code using high performance table parameters for insertion. I would then dispose of all of the objects in C#, and read back from the database the entire objects. However, the original records had dozens of columns with lots of varchar and decimal values, so this method was very network traffic intensive, although it was fast and it worked.
  3. I now began research to figure out whether it was possible to use the table parameter insert, while asking SQL Server to just report back the identity values. I tried scope_identity() and OUTPUT but haven't been successful so far on either.

Basically, this problem would be solved if SQL Server would always insert the records in exactly the order I provided them. Is it possible to make SQL server insert records in exactly the order they are provided in a table value parameter insert?

EDIT2: This approach seems very similar to what Cade Roux cites below:

http://www.sqlteam.com/article/using-the-output-clause-to-capture-identity-values-on-multi-row-inserts

However, in the article, the author uses a magic unique value, "ProductNumber", to connect the inserted information from the "output" value to the original table value parameter. I'm trying to figure out how to do this if my table doesn't have a magic unique value.

Ted Spence
  • 2,598
  • 1
  • 21
  • 21
  • what type of Indexing do you have on the table.. is there a way to do a stored procedure to get the MaxId meaning the Identity column of the last updated row..? I hope that I am understanding your question correctly..? basically return @@IDENTITY – MethodMan Aug 23 '12 at 17:23
  • @DJKRAZE: [don't use @@IDENTITY - use SCOPE_IDENTITY() instead!](http://blog.sqlauthority.com/2007/03/25/sql-server-identity-vs-scope_identity-vs-ident_current-retrieve-last-inserted-identity-of-record/) – marc_s Aug 23 '12 at 17:25
  • I originally tried this with scope_identity() - the results were unpredictable as well. Out of the 100k rows, it seemed like a random segment of 10,000 rows would be inserted in the wrong order. – Ted Spence Aug 23 '12 at 17:26
  • @DJKRAZE - the entire table definition is in the SQL script above; I created the table on a blank database using exactly that script. Feel free to try it out if you'd like to test it yourself :) – Ted Spence Aug 23 '12 at 17:26
  • Cade Roux is absolutely correct.. I didn't notice that the OP has no Order By Clause – MethodMan Aug 23 '12 at 17:31
  • Couldn't you do a CURSOR? Saves the network traffic of individual calls, but still you do individual inserts basically, only at the server side. – Wout Sep 19 '13 at 14:37

2 Answers2

6

Your TVP is an unordered set, just like a regular table. It only has order when you specify as such. Not only do you not have any way to indicate actual order here, you're also just doing a SELECT * at the end with no ORDER BY. What order do you expect here? You've told SQL Server, effectively, that you don't care. That said, I implemented your code and had no problems getting the rows back in the right order. I modified the procedure slightly so that you can actually tell which identity value belongs to which comment:

DECLARE @TableOfIdentities TABLE (IdentValue INT, comment varchar(20))

INSERT INTO bulk_insert_test (comment)
OUTPUT Inserted.item_id, Inserted.comment 
INTO @TableOfIdentities(IdentValue, comment)
SELECT comment FROM @mytable

SELECT * FROM @TableOfIdentities

Then I called it using this code (we don't need all the C# for this):

DECLARE @t bulk_insert_table_type;
INSERT @t VALUES(5,'foo'),(2,'bar'),(3,'zzz');
SELECT * FROM @t;

EXEC dbo.proc_bulk_insert_test @t;

Results:

1   foo
2   bar
3   zzz

If you want to make sure the output is in the order of identity assignment (which isn't necessarily the same "order" that your unordered TVP has), you can add ORDER BY item_id to the last select in your procedure.

If you want to insert into the destination table so that your identity values are in an order that is important to you, then you have a couple of options:

  • add a column to your TVP and insert the order into that column, then use a cursor to iterate over the rows in that order, and insert one at a time. Still more efficient than calling the entire procedure for each row, IMHO.

  • add a column to your TVP that indicates order, and use an ORDER BY on the insert. This isn't guaranteed, but is relatively reliable, particularly if you eliminate parallelism issues using MAXDOP 1.

In any case, you seem to be placing a lot of relevance on ORDER. What does your order actually mean? If you want to place some meaning on order, you shouldn't be doing so using an IDENTITY column.

Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
  • If I'm understanding this correctly, your solution requires that I read back the entire data set that I just inserted - that's basically what my current solution does. Is it possible to just get the identity values though? – Ted Spence Aug 23 '12 at 17:49
  • Yes, but what do the identity values mean by themselves? What are you going to do with them? – Aaron Bertrand Aug 23 '12 at 17:50
  • Anyway I only added those so you can see which identity value belonged to which table, not as a part of the "solution" - I still don't think I understand the problem because I think there are some misconceptions about order. – Aaron Bertrand Aug 23 '12 at 18:00
  • I'm asking for general knowledge. My applications often use the identity values in order to match relational tables. Currently, I have to mix table value parameters (to get high performance inserts) with complex stored procedures (to preserve identity values and relational data). However, some of my applications have to parse through 10-30 GB of data and match records from file A with records from file B; I'd like to be able to do that all in C# by using fast inserts and retrieving back the identity numbers. – Ted Spence Aug 23 '12 at 18:00
  • You are far better off matching on natural, identifiable values than relying on any assumptions about order. There are no guarantees about how identity values are assigned unless you insert individual rows in a loop. – Aaron Bertrand Aug 23 '12 at 18:02
  • I have a similar problem, but I guess it's not the order that's crucial, but in the application I need each inserted record to get the Id it got assigned in the database. And the easiest way to do it on the application side, is to get the Id's in the same order as the records were sent to the database, because then you just assign match the Id array with the original record array with minimal fuss. I guess the only way to get this done (with my limited DB knowledge, I'm a programmer) is to do a cursor and simply loop over the records. – Wout Sep 26 '13 at 15:28
  • @Wout I'm not sure I understand. Is there nothing unique that the application knows about these rows that it can match from the `OUTPUT` clause (which can include input data as well as the generated `IDENTITY` values)? A cursor and a loop sounds like a very inefficient, unwise and unnecessary solution. – Aaron Bertrand Sep 26 '13 at 15:32
  • Our record is a bit more complex than this, but for argument's sake suppose it is just Person record with Id int and Name varchar columns, where the Id is an auto number, and the Name a non-unique string (2 guys can have name Mark). Ideally, we'd just send the names over, and get the Id's back (because other records refer to them, and sub sequent db inserts have to be done with those id's). I guess in theory it'd even be ok to switch the id's around if the names are identical, but that doesn't quite feel right. Not sure if I could get away with that even – Wout Oct 02 '13 at 12:23
3

You specify no ORDER BY on this: SELECT * FROM @TableOfIdentities so there's no guarantee of order. If you want them in the same order they were sent, do an INNER JOIN in that to the data that was inserted with an ORDER BY which matches the order the rows were sent in.

Cade Roux
  • 88,164
  • 40
  • 182
  • 265
  • If I put an `order by` clause in the `select * from @tableofidentities`, that won't affect the insert, will it? If SQL Server decides to insert data in a random order, how can I ensure that the identities come back in a matching order? – Ted Spence Aug 23 '12 at 18:02
  • @TedSpence No. And in fact the INSERT can be done in parallel for efficiency. Say you load your TVP from the client in a known order, say it will become column NATURAL_KEY. Then `SELECT i.IdentValue FROM @TableOfIdentities AS i INNER JOIN bulk_insert_test AS d ON d.item_id = i.IdentValue ORDER BY d.NATURAL_KEY ` – Cade Roux Aug 23 '12 at 18:29
  • @TedSpence If your client is in an order (i.e. an array), but no particularly identifiable and reproducible/definable order and no way to refer to them by natural key and you need to get the identities back onto those data structures in the array, you're kind of out of luck unless you send that indexer into the database to be held in the table and then returned. – Cade Roux Aug 23 '12 at 18:31
  • @TedSpence If you abandon the batch insert, you can control the order but you lose the efficiency. Another way to load is with SqlBulkCopy, from the client, but I don't think it has any special return handling. I'm still a little mystified about the importance of having this huge number of identities updated back in the client. – Cade Roux Aug 23 '12 at 18:44
  • Let's say I'm receiving sales data for 10,000 products, for 3000 stores, daily, for a month. My code attempts to batch insert this data as rapidly as possible, but we want to preserve the connections between each product, each store, and its daily data by using high performance integer (or bigint) foreign key references. – Ted Spence Aug 23 '12 at 19:26
  • Also, I have to do significant amounts of post-processing work on each record, which is why the data is loaded directly into a C# application. – Ted Spence Aug 23 '12 at 19:33
  • @TedSpence Sounds like data warehousing. Have you looked at those concepts? Dimensional modeling and everything? Is that 900m rows a month? That seems like a lot to keep in memory. – Cade Roux Aug 23 '12 at 20:54
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/15737/discussion-between-ted-spence-and-cade-roux) – Ted Spence Aug 23 '12 at 21:12