How to retrieve server generated Identity values when using SqlBulkCopy

Question

I know I can do a bulk insert into my table with an identity column by not specifying the SqlBulkCopyOptions.KeepIdentity as mentioned here.

What I would like to be able to do is get the identity values that the server generates and put them in my datatable, or even a list. I saw this post, but I want my code to be general, and I can't have a version column in all my tables. Any suggestions are much appreciated. Here is my code:

public void BulkInsert(DataTable dataTable, string DestinationTbl, int batchSize)
{
    // Get the DataTable 
    DataTable dtInsertRows = dataTable;

    using (SqlBulkCopy sbc = new SqlBulkCopy(sConnectStr))
    {
        sbc.DestinationTableName = DestinationTbl;

        // Number of records to be processed in one go
        sbc.BatchSize = batchSize;

        // Add your column mappings here
        foreach (DataColumn dCol in dtInsertRows.Columns)
        {
            sbc.ColumnMappings.Add(dCol.ColumnName, dCol.ColumnName);
        }

        // Finally write to server
        sbc.WriteToServer(dtInsertRows);
    }
}

I did this by adding another column to the destination table called "OriginalKey" and then after the SqlBulkCopy completed selecting OriginalKey, NewIdentityId and dropped the column. I believe you can also SqlBulkCopy to a temp table and the INSERT with the OUTPUT clause but now you're doing two steps instead of one. Or skip SqlBulkCopy and use a TVP in an INSERT ... OUTPUT statement. — ta.speot.is, Feb 10 '14 at 21:12
Have you considered creating an INSERT Trigger on your table and copying the identity values to a temporary table that your C# code can query against? — Rick S, Feb 10 '14 at 21:22
@ta.speot.is If I have a large table (several million records) what effect will adding and dropping a column have on performance? Would this work with concurrent processes inserting to the same table? — dseiple, Feb 10 '14 at 21:23
Schema lock on the table which will basically serialize access to the table across all the processes inserting. — ta.speot.is, Feb 10 '14 at 21:25

score 6 · Accepted Answer · answered Feb 10 '14 at 22:41

AFAIK, you can't.

The only way (that I know of) to get the values(s) of the identity field is by using either SCOPE_IDENTITY() when you insert row-by-row; or by using the OUTPUT approach when inserting an entire set.

The 'simplest' approach probably would be that you would SqlBulkCopy the records in the table and then fetch them back again later on. The problem might be that it could be hard to properly (and quickly) fetch those rows from the server again. (e.g. it would be rather ugly (and slow) to have a WHERE clause with IN (guid1, guid2, .., guid999998, guid999999) =)

I'm assuming performance is an issue here as you're already using SqlBulkCopy so I'd suggest to go for the OUTPUT approach in which case you'll firstly need a staging table to SqlBulkCopy your records in. Said table should then be including some kind of batch-identifier (GUID?) as to allow multiple treads to run side by side. You'll need a stored procedure to INSERT <table> OUTPUT inserted.* SELECT the data from the staging-table into the actual destination table and also clean-up the staging table again. The returend recordset from said procedure would then match 1:1 to the origanal dataset responsible for filling the staging table, but off course you should NOT rely on it's order. In other words : your next challenge than will be matching the returned Identity-fields back to the original records in your application.

Thinking things over, I'd say that in all cases -- except the row-by-row & SCOPY_IDENTITY() approach, which is going to be dog-slow -- you'll need to have (or add) a 'key' to your data to link the generated id's back to the original data =/

score 0 · Answer 2 · edited Jan 04 '22 at 18:26

You can do a similar approach described above by deroby but instead of retrieving them back via a WHERE IN (guid1, etc... You match them back up to the rows inserted in memory based on their order.

So I would suggest to add a column onto the table to match the row to a SqlBulkCopy transaction and then do the following to match the generated Ids back to the in memory collection of rows you just inserted.

Create a new Guid and set this value on all the rows in the bulk copy mapping to the new column
Run the WriteToServer method of the BulkCopy object
Retrieve all the rows that have that same key
Iterate through this list which will be in the order they were added, these will be in the same order as the the in memory collection of rows so you then will know the generated id for each item.

This will give you better performance than giving each individual row a unique key. So after you bulk insert the data table you could do something like this (In my example I will have a list of objects from which I will create the data table and then map the generated ids back to them)

List<myObject> myCollection = new List<myObject>

Guid identifierKey = Guid.NewGuid();

//Do your bulk insert where all the rows inserted have the identifierKey
//set on the new column. In this example you would create a data table based
//off the myCollection object.

//Identifier is a column specifically for matching a group of rows to a sql  
//bulk copy command
var myAddedRows = myDbContext.DatastoreRows.AsNoTracking()
            .Where(d => d.Identifier == identiferKey)
            .ToList();


 for (int i = 0; i < myAddedRows.Count ; i++)
 {
    var savedRow = myAddedRows[i];
    var inMemoryRow = myCollection[i];

    int generatedId = savedRow.Id;
   
   //Now you know the generatedId for the in memory object you could set a
   // a property on it to store the value

   inMemoryRow.GeneratedId = generatedId;
 }

Although this probably works in 99% of the cases, I do wonder if SqlBulkCopy guarantees it retains the original order when being loaded and/or when sending the data to the RDBMS. But, even when it does, your fetch then should make sure it retrieves the values in the order of the identity fields. As your current query does not enforce any order you're simply asking for the set of rows in any given order. Depending on the table-layout you might (often) get the data back in the order of the identity field, but there's no guarantee! Once multi-threading kicks in you're going to be bitten!! — deroby, Dec 03 '15 at 09:44
A more sure-fire approach that doesn't require the addition of a new column to the table to hold a GUID is to use a DataSet containing two DataTables and set up a ForeignKeyConstraint with an UpdateRule set to "Cascade" as per my suggestion below. — Nel Prinsloo, Mar 26 '23 at 01:33
I don't understand why one would do it in such a roundabout way and relying on it being inserted in the correct order when you can just rely on the guaranteed and built in capability of adding the two tables to a dataset with a foreign key constraint between the parent and child rows. Then SQL guarantees that it will be correct and you don't have to bother with adding extra columns to your table for the mere sake of matching the original parents to the correct children. — Nel Prinsloo, Apr 04 '23 at 08:08

Nel Prinsloo · Answer 3 · 2023-04-04T08:13:10.337

This method doesn't require addition of any unnecessary columns and relies on the built in functionality of the data to maintain a foreign key constraint between the parent row Id and the child row HeaderId. It doesn't rely on the order in which rows are inserted into the database and you only need two WriteToServer statements. One for the parent rows and another for all the child rows.

Your source datatable has a specified identity column, that you can set up when you create your datatable. Then you add your source (header) table plus the detail table where you want to stamp the related HeaderId to a DataSet and you create a ForeignKeyConsraint with a UpdateRule of Rule.Cascade.

When you call WriteToServer for your header table's data, the related column as specified by the ForeignKeyConstraint automatically gets updated if UpdateRule is set to Cascade.

Below are some snippets that should give you an idea of what works very well for me. For a small example solution refer to this example code on GitHub: BulkInsertHeaderDetail.

internal static class ClaimsDataSet
{
    static DataSet _claimsDataSet = new DataSet();
    static int _currentHeaderId = 0;

    static ClaimsDataSet()
    {
        _claimsDataSet.Tables.Add("Header");
        _claimsDataSet.Tables["Header"].Columns.Add("Id", typeof(int)).AutoIncrement = true;
        _claimsDataSet.Tables["Header"].Columns["Id"].AutoIncrementSeed = 1;
        _claimsDataSet.Tables["Header"].Columns["Id"].AutoIncrementStep = 1;
        _claimsDataSet.Tables["Header"].Columns["Id"].Unique = true;
        _claimsDataSet.Tables["Header"].Columns.Add("TreatmentDate", typeof(System.DateTime));

        // Code omitted for brevity. Refer to sample app on github for all source code



        _claimsDataSet.Tables.Add("Detail");
        _claimsDataSet.Tables["Detail"].Columns.Add("Id", typeof(int)).AutoIncrement = true;
        _claimsDataSet.Tables["Detail"].Columns["Id"].AutoIncrementSeed = 1;
        _claimsDataSet.Tables["Detail"].Columns["Id"].AutoIncrementStep = 1;
        _claimsDataSet.Tables["Detail"].Columns["Id"].Unique = true;
        _claimsDataSet.Tables["Detail"].Columns.Add("HeaderId", typeof(int));
        _claimsDataSet.Tables["Detail"].Columns.Add("TreatmentDate", typeof(System.DateTime));
        // Code omitted for brevity. Refer to sample app on github for all source code

        ForeignKeyConstraint foreignKeyConstraint = new ForeignKeyConstraint("HeaderIdConstraint", _claimsDataSet.Tables["Header"].Columns["Id"], _claimsDataSet.Tables["Detail"].Columns["HeaderId"]);
        foreignKeyConstraint.UpdateRule = Rule.Cascade;
        _claimsDataSet.Tables["Detail"].Constraints.Add(foreignKeyConstraint);
    }

internal static int AddHeaderRow(string rowContent)
    {
        string errorMessage;
        DateTime workingDate;
        decimal workingAmount;
        string[] commaSeparatedValues = ParseCSVLine(rowContent);
        DataRow row = _claimsDataSet.Tables["Header"].NewRow();

        if (DateTime.TryParse(commaSeparatedValues[0], out workingDate))
        {
            row["TreatmentDate"] = workingDate;
        }
        else
        {
            errorMessage = String.Format("Error converting string content to date value in {0}, Column: {1}", "Treatment Header", "TreatmentDate");
            Console.WriteLine(errorMessage);
            throw new FormatException(errorMessage);
        }
        row["Beneficiary"] = commaSeparatedValues[1];
        row["ServiceProvider"] = commaSeparatedValues[2];

        // Code omitted for brevity. Refer to sample app on github for all source code


        _claimsDataSet.Tables["Header"].Rows.Add(row);
        _currentHeaderId = Int32.Parse(row["Id"].ToString());
        return _currentHeaderId;
    }

    internal static void AddDetailRow(string rowContent)
    {
        string errorMessage = "";
        DateTime workingDate;
        Decimal workingAmount;
        string[] commaSeparatedValues = ParseCSVLine(rowContent);
        DataRow row = _claimsDataSet.Tables["Detail"].NewRow();
        row["HeaderId"] = _currentHeaderId;
        if (DateTime.TryParse(commaSeparatedValues[0], out workingDate))
        {
            row["TreatmentDate"] = workingDate;
        }
        else
        {
            errorMessage = String.Format("Error converting string content to date value in {0}, Column: {1}", "Treatment Detail", "TreatmentDate");
            Console.WriteLine(errorMessage);
            throw new FormatException(errorMessage);
        }
        row["TariffCode"] = commaSeparatedValues[1];
        row["TariffDescription"] = commaSeparatedValues[2];
        
        // Code omitted for brevity. Refer to sample app on github for all source code
        
        
        _claimsDataSet.Tables["Detail"].Rows.Add(row);
    }

    internal static void WriteToTargetDatabase()
    {
        try
        {
            string connectionString = ConfigurationManager.ConnectionStrings["claimAdminConnectionString"].ConnectionString;
            using (SqlConnection destinationConnection = new SqlConnection(connectionString))
            {
                destinationConnection.Open();
                ConnectionState destState = destinationConnection.State;
                SqlBulkCopy bulkCopy = new SqlBulkCopy(destinationConnection);
                bulkCopy.ColumnMappings.Add("TreatmentDate", "TreatmentDate");
                
                // Code omitted for brevity. Refer to sample app on github for all source code
                

                bulkCopy.DestinationTableName = "dbo.ClaimHeader";
                bulkCopy.WriteToServer(_claimsDataSet.Tables["Header"]);

                bulkCopy.ColumnMappings.Clear();
                bulkCopy.ColumnMappings.Add("HeaderId", "HeaderId");
                bulkCopy.ColumnMappings.Add("TreatmentDate", "ClaimDate");
                bulkCopy.ColumnMappings.Add("TariffCode", "TariffCode");
                
                // Code omitted for brevity. Refer to sample app on github for all source code
                
                
                bulkCopy.DestinationTableName = "dbo.ClaimDetail";
                bulkCopy.WriteToServer(_claimsDataSet.Tables["Detail"]);
                destinationConnection.Close();
            }
        }
        catch (Exception ex) 
        {
            Console.WriteLine("Problem with bulk copy operation...");
            Console.WriteLine(ex.Message);
        }
    }

    internal static string[] ParseCSVLine(string inputString)
    {
        
        // Code omitted for brevity. Refer to sample app on github for all source code
        
        
        //takes inputString and splits it into array of strings split at commas
        
        // Convert list to array and return.
        return finalValue.ToArray();
    }
}

How to retrieve server generated Identity values when using SqlBulkCopy

3 Answers3

Linked