27

I am using C# and using SqlBulkCopy. I have a problem though. I need to do a mass insert into one table then another mass insert into another table.

These 2 have a PK/FK relationship.

Table A
Field1 -PK auto incrementing (easy to do SqlBulkCopy as straight forward)

Table B
Field1 -PK/FK - This field makes the relationship and is also the PK of this table. It is not auto incrementing and needs to have the same id as to the row in Table A.

So these tables have a one to one relationship but I am unsure how to get back all those PK Id that the mass insert made since I need them for Table B.

Edit

Could I do something like this?

SELECT * 
FROM Product
WHERE NOT EXISTS (SELECT * FROM ProductReview WHERE Product.ProductId = ProductReview.ProductId AND Product.Qty = NULL AND Product.ProductName != 'Ipad')

This should find all the rows that where just inserted with the sql bulk copy. I am not sure how to take the results from this then do a mass insert with them from a SP.

The only problem I can see with this is that if a user is doing the records one at a time and a this statement runs at the same time it could try to insert a row twice into the "Product Review Table".

So say I got like one user using the manual way and another user doing the mass way at about the same time.

manual way. 1. User submits data 2. Linq to sql Product object is made and filled with the data and submited. 3. this object now contains the ProductId 4. Another linq to sql object is made for the Product review table and is inserted(Product Id from step 3 is sent along).

Mass way. 1. User grabs data from a user sharing the data. 2. All Product rows from the sharing user are grabbed. 3. SQL Bulk copy insert on Product rows happens. 4. My SP selects all rows that only exist in the Product table and meets some other conditions 5. Mass insert happens with those rows.

So what happens if step 3(manual way) is happening at the same time as step 4(mass way). I think it would try to insert the same row twice causing a primary constraint execption.

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
chobo2
  • 83,322
  • 195
  • 530
  • 832

7 Answers7

19

In that scenario, I would use SqlBulkCopy to insert into a staging table (i.e. one that looks like the data I want to import, but isn't part of the main transactional tables), and then at the DB to a INSERT/SELECT to move the data into the first real table.

Now I have two choices depending on the server version; I could do a second INSERT/SELECT to the second real table, or I could use the INSERT/OUTPUT clause to do the second insert , using the identity rows from the table.

For example:

     -- dummy schema
     CREATE TABLE TMP (data varchar(max))
     CREATE TABLE [Table1] (id int not null identity(1,1), data varchar(max))
     CREATE TABLE [Table2] (id int not null identity(1,1), id1 int not null, data varchar(max))

     -- imagine this is the SqlBulkCopy
     INSERT TMP VALUES('abc')
     INSERT TMP VALUES('def')
     INSERT TMP VALUES('ghi')

     -- now push into the real tables
     INSERT [Table1]
     OUTPUT INSERTED.id, INSERTED.data INTO [Table2](id1,data)
     SELECT data FROM TMP
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • Hmm. I been working on something do you think it would work(see my edit). If not I guess I will try a staging table. – chobo2 May 31 '10 at 19:54
  • 2
    @chobo2 - well, except in a few scenarios I would use a staging table anyway - so that a: I'm not impacting the real table during network IO time, and b: to get full transaction logs. – Marc Gravell May 31 '10 at 19:59
  • Ok I just got my edit up. Going through I am thinking I might have to do a staging tbl way. Not sure yet. I have some questions on your way though. Is the -- Are these dummy table created on the file or just used for example purposes? 2nd how do you do a SQlbulkCopy in a stored procedure. 3rd how do this push thing work. You just insert the entire table or something? 4th how about concurrent connections where maybe a couple users? It would all go the staging table so there would need to be some sort of way to know which data to add then delete? – chobo2 May 31 '10 at 20:26
  • @chobo2 - I would have the staging table there as a permanent table. You might need some extra work if you need parallel usage, though. Re SP; I would use `SqlBulkCopy` *then* call an SP, but you can use bulk-insert (not `SqlBulkCopy`) from TSQL. 3: don't understand the question, but you write the code you need for your scenario... 4: in that case I would add a column to the staging table to identify the different requests (all rows for request A would have the same value). Alternatively, enforce (separately) that only one happens at once (but parallel is nicer). – Marc Gravell May 31 '10 at 20:33
  • So how take my results from a select statement and then do a insert with them? I am not sure how to store them in a sp and then do a bulk-insert. – chobo2 May 31 '10 at 20:38
  • @chobo2 - the example does a select/insert; is there something more specific? – Marc Gravell May 31 '10 at 21:23
  • Ah maybe I am just getting confused by this line. OUTPUT INSERTED.id, INSERTED.data INTO [Table2](id1,data) – chobo2 May 31 '10 at 21:26
  • FYI: Example code for using a staging table along with bulk insert from c# here: http://stackoverflow.com/a/41289532/361842 – JohnLBevan Dec 22 '16 at 18:25
  • @MarcGravell You didn't add an FK between Table1 and Table2 like OP's example. Using the "INSERT/OUTPUT" construct results in the error "The target table 'Table2' of the OUTPUT INTO clause cannot be on either side of a (primary key, foreign key) relationship. Found reference constraint 'FK_Field1_Table1"... – BassGod Dec 16 '21 at 22:14
10

If your app allows it, you could add another column in which you store an identifier of the bulk insert (a guid for example). You would set this id explicitly.

Then after the bulk insert, you just select the rows that have that identifier.

Cosmin
  • 2,365
  • 2
  • 23
  • 29
6

I had the same issue where I had to get back ids of the rows inserted with SqlBulkCopy. My ID column was an identity column.

Solution:

I have inserted 500+ rows with bulk copy, and then selected them back with the following query:

SELECT TOP InsertedRowCount * 
FROM   MyTable 
ORDER BY ID DESC

This query returns the rows I have just inserted with their ids. In my case I had another unique column. So I selected that column and id. Then mapped them with a IDictionary like so:

 IDictionary<string, int> mymap = new Dictionary<string, int>()
 mymap[Name] = ID

Hope this helps.

Leigh
  • 28,765
  • 10
  • 55
  • 103
  • 20
    This is a good solution, but **ONLY** if you can **guarantee** that no records are being inserted from another thread after you insert but before you select the items. – marknuzz Jun 17 '13 at 19:00
  • besides the valid point given by @nuzzolilo, this answer deserves more upvotes. – Dirk Boer Jul 19 '19 at 10:25
3

My approach is similar to what RiceRiceBaby described, except one important thing to add is that the call to retrieve Max(Id) needs to be a part of a transaction, along with the call to SqlBulkCopy.WriteToServer. Otherwise, someone else may insert during your transaction and this would make your Id's incorrect. Here is my code:

public static void BulkInsert<T>(List<ColumnInfo> columnInfo, List<T> data, string 
destinationTableName, SqlConnection conn = null, string idColumn = "Id")
    {
        NLogger logger = new NLogger();

        var closeConn = false;


        if (conn == null)
        {
            closeConn = true;
            conn = new SqlConnection(_connectionString);
            conn.Open();
        }

        SqlTransaction tran = 
    conn.BeginTransaction(System.Data.IsolationLevel.Serializable);

        try
        {
            var options = SqlBulkCopyOptions.KeepIdentity;
            var sbc = new SqlBulkCopy(conn, options, tran);

            var command = new SqlCommand(
                    $"SELECT Max({idColumn}) from {destinationTableName};", conn, 
           tran);
            var id = command.ExecuteScalar();

            int maxId = 0;

            if (id != null && id != DBNull.Value)
            {
                maxId = Convert.ToInt32(id);
            }

            data.ForEach(d =>
                {
                    maxId++;
                    d.GetType().GetProperty(idColumn).SetValue(d, maxId);
                });

            var dt = ConvertToDataTable(columnInfo, data);

            sbc.DestinationTableName = destinationTableName;

            foreach (System.Data.DataColumn dc in dt.Columns)
            {
                sbc.ColumnMappings.Add(dc.ColumnName, dc.ColumnName);
            }

            sbc.WriteToServer(dt);

            tran.Commit();

            if(closeConn)
            {
                conn.Close();
                conn = null;
            }
        }
        catch (Exception ex)
        {
            tran.Rollback();
            logger.Write(LogLevel.Error, $@"An error occurred while performing a bulk 
insert into table {destinationTableName}. The entire
                                           transaction has been rolled back. 

{ex.ToString()}");
            throw ex;
        }
    }
Becca
  • 106
  • 1
  • 8
  • I considered an approach like yours, but I don't think this solves the problem of ID collisions. Using a `Transaction` doesn't prevent anyone else from using the table, inserting rows, and consuming IDs before you complete your insert. Any single insert prior to completion and this blows up. Also, `sbc` must be disposed. – DonBoitnott Jun 19 '19 at 13:19
  • What am I missing? According to Microsoft docs if System.Data.IsolationLevel.Serializable is used, then "A range lock is placed on the DataSet, preventing other users from updating or inserting rows into the dataset until the transaction is complete." I also tested by attempting to insert from SSMS while my bulkinsert was being made, and my insert always occurred after the entire bulkinsert was completed. – Becca Jun 20 '19 at 16:17
  • Apparently my understanding of serializable was incomplete. Now that I've studied it more closely I see what you mean. Looks like it should block subsequent writes until the batch insert is done. My mistake. – DonBoitnott Jun 20 '19 at 17:02
  • No problem! It's been pretty durable, I've been using it in production for a while with no issues. – Becca Jun 20 '19 at 17:30
0

Depending on your needs and how much control you have of the tables, you may want to consider using UNIQUEIDENTIFIERs (Guids) instead of your IDENTITY primary keys. This moves key management outside of the database and into your application. There are some serious tradeoffs to this approach, so it may not meet your needs. But it may be worth considering. If you know for sure that you'll be pumping a lot of data into your tables via bulk-insert, it is often really handy to have those keys managed in your object model rather than your application relying on the database to give you back the data.

You could also take a hybrid approach with staging tables as suggested before. Get the data into those tables using GUIDs for the relationships, and then via SQL statements you could get the integer foreign keys in order and pump data into your production tables.

mattmc3
  • 17,595
  • 7
  • 83
  • 103
0

I would:

  1. Turn on identity insert on the table

  2. Grab the Id of the last row of the table

  3. Loop from (int i = Id; i < datable.rows.count+1; i++)

  4. In the loop, assign the Id property of your datable to i+1.

  5. Run your SQL bulk insert with your keep identity turned on.

  6. Turn identity insert back off

I think that's the safest way to get your ids on an SQL bulk insert because it will prevent mismatched ids that could caused by the application be executed on another thread.

Mogsdad
  • 44,709
  • 21
  • 151
  • 275
RiceRiceBaby
  • 1,546
  • 4
  • 16
  • 30
-2

Disclaimer: I'm the owner of the project C# Bulk Operations

The library overcome SqlBulkCopy limitations and add flexible features like output inserted identity value.

Behind the code, it does exactly like the accepted answer but way easier to use.

var bulk = new BulkOperation(connection);

// Output Identity
bulk.ColumnMappings.Add("ProductID", ColumnMappingDirectionType.Output);
// ... Column Mappings...

bulk.BulkInsert(dt);
Jonathan Magnan
  • 10,874
  • 2
  • 38
  • 60
  • 12
    I am woundering, how can people market thier products on this forum.:-) SklBulkCopy related features are just wrapped for that much cost... Incredible. – Usman Sep 14 '16 at 10:22
  • Hi Can you please expand on how to get the inserted Row ID back, with date and time please, I cant seem to get the rows - ID's for those which were successfully inserted when I tried your code. – Transformer May 28 '19 at 22:08
  • Here is a Fiddle that shows how to return the value: https://dotnetfiddle.net/g5pSS1, to return more column, you just need to map the column with the direction `Output` – Jonathan Magnan May 29 '19 at 11:17