I'm trying to insert records using a high performance table parameter method ( http://www.altdevblogaday.com/2012/05/16/sql-server-high-performance-inserts/ ), and I'm curious if it's possible to retrieve back the identity values for each record I insert.
At the moment, the answer appears to be no - I insert the data, then retrieve back the identity values, and they don't match. Specifically, they don't match about 75% of the time, and they don't match in unpredictable ways. Here's some code that replicates this issue:
// Create a datatable with 100k rows
DataTable dt = new DataTable();
dt.Columns.Add(new DataColumn("item_id", typeof(int)));
dt.Columns.Add(new DataColumn("comment", typeof(string)));
for (int i = 0; i < 100000; i++) {
dt.Rows.Add(new object[] { 0, i.ToString() });
}
// Insert these records and retrieve back the identity
using (SqlConnection conn = new SqlConnection("Data Source=localhost;Initial Catalog=testdb;Integrated Security=True")) {
conn.Open();
using (SqlCommand cmd = new SqlCommand("proc_bulk_insert_test", conn)) {
cmd.CommandType = CommandType.StoredProcedure;
// Adding a "structured" parameter allows you to insert tons of data with low overhead
SqlParameter param = new SqlParameter("@mytable", SqlDbType.Structured);
param.Value = dt;
cmd.Parameters.Add(param);
SqlDataReader dr = cmd.ExecuteReader();
// Set all the records' identity values
int i = 0;
while (dr.Read()) {
dt.Rows[i].ItemArray = new object[] { dr.GetInt32(0), dt.Rows[i].ItemArray[1] };
i++;
}
dr.Close();
}
// Do all the records' ID numbers match what I received back from the database?
using (SqlCommand cmd = new SqlCommand("SELECT * FROM bulk_insert_test WHERE item_id >= @base_identity ORDER BY item_id ASC", conn)) {
cmd.Parameters.AddWithValue("@base_identity", (int)dt.Rows[0].ItemArray[0]);
SqlDataReader dr = cmd.ExecuteReader();
DataTable dtresult = new DataTable();
dtresult.Load(dr);
}
}
The database is defined using this SQL server script:
CREATE TABLE bulk_insert_test (
item_id int IDENTITY (1, 1) NOT NULL PRIMARY KEY,
comment varchar(20)
)
GO
CREATE TYPE bulk_insert_table_type AS TABLE ( item_id int, comment varchar(20) )
GO
CREATE PROCEDURE proc_bulk_insert_test
@mytable bulk_insert_table_type READONLY
AS
DECLARE @TableOfIdentities TABLE (IdentValue INT)
INSERT INTO bulk_insert_test (comment)
OUTPUT Inserted.item_id INTO @TableOfIdentities(IdentValue)
SELECT comment FROM @mytable
SELECT * FROM @TableOfIdentities
Here's the problem: the values returned from proc_bulk_insert_test
are not in the same order as the original records were inserted. Therefore, I can't programmatically assign each record the item_id
value I received back from the OUTPUT
statement.
It seems like the only valid solution is to SELECT
back the entire list of records I just inserted, but frankly I'd prefer any solution that would reduce the amount of data piped across my SQL Server's network card. Does anyone have better solutions for large inserts while still retrieving identity values?
EDIT: Let me try clarifying the question a bit more. The problem is that I would like my C# program to learn what identity values SQL Server assigned to the data that I just inserted. The order isn't essential; but I would like to be able to take an arbitrary set of records within C#, insert them using the fast table parameter method, and then assign their auto-generated ID numbers in C# without having to requery the entire table back into memory.
Given that this is an artificial test set, I attempted to condense it into as small of a readable bit of code as possible. Let me describe what methods I have used to resolve this issue:
- In my original code, in the application this example came from, I would insert about 15 million rows using 15 million individual insert statements, retrieving back the identity value after each insert. This worked but was slow.
- I revised the code using high performance table parameters for insertion. I would then dispose of all of the objects in C#, and read back from the database the entire objects. However, the original records had dozens of columns with lots of varchar and decimal values, so this method was very network traffic intensive, although it was fast and it worked.
- I now began research to figure out whether it was possible to use the table parameter insert, while asking SQL Server to just report back the identity values. I tried
scope_identity()
andOUTPUT
but haven't been successful so far on either.
Basically, this problem would be solved if SQL Server would always insert the records in exactly the order I provided them. Is it possible to make SQL server insert records in exactly the order they are provided in a table value parameter insert?
EDIT2: This approach seems very similar to what Cade Roux cites below:
However, in the article, the author uses a magic unique value, "ProductNumber", to connect the inserted information from the "output" value to the original table value parameter. I'm trying to figure out how to do this if my table doesn't have a magic unique value.