This question is part of a much more complex problem that I am breaking down into smaller chunks (for my sanity).
Let's say I have a Parts
table that is self-referencing and looks something like this:
PartId ParentPartId Description PartNumber IsCatHeader ProviderId
---------------------------------------------------------------------------------------
9292 null 'Engine Parts' null 1 'Engine Parts||1'
9293 9292 'Engine Bolts' null 1 'Engine Bolts||1'
9294 9293 '6mm Engine Bolt' 'X1-234-ABC' 0 '6mm Engine Bolt|X1-234-ABC|0'
9295 9293 '5mm Engine Bolt' 'X2-934-BCD' 0 '5mm Engine Bolt|X2-934-BCD|0'
9296 9295 '5mm Engine Bolt Washer' 'X2-934-GED' 0 '5mm Engine Bolt Washer|X2-934-GED|0'
You get the idea. Now... we are importing entire books (massive CSV files) of these parts, to the tune of several hundred line items in a single book.
Parts are often duplicated across books, and part of our job is to keep duplicates out of the database.
The source does not provide any kind of unique id for these parts, so we have created a ProviderId
column that is a collection of data parts from each record that creates a unique string. We can then use this to check for duplicates as we do the import. (The actual data in this column is more complex than what I've shown here.)
So, now to my problem. I am trying to figure out the best way to do this in bulk. One option (NOT a good one) is to cycle through each item one at a time from the C# application.... insert a parent, get the SCOPE IDENTITY, insert all children, etc. Yuck. In a large book, this would result in thousands of DB calls per book. Not an option.
We need a bulk insert solution. But we have a real conundrum with the self-referencing aspect of this.
Our original thought was to build the ENTIRE data model in C#, including all of the PartId
's and ParentPartId
's. Then bulk insert directly into the Parts
table. The problem with this, however, is knowing what ID to start with. Keep in mind, multiple processes will be running simultaneously, and many of the parts will be duplicates. We tried using a SEQUENCE object, but that presented problems... it's 100% possible that duplicate books will be processed, which would result in giant gaps in the ID's if we use a SEQUENCE.
The course I am chasing now is this... We have created a Parts_Staging
table that looks almost like the actual Parts
table. And we can do a bulk insert to it, no problem. Then it's a simple query to use the ProviderId
column to look for records in Parts_Staging
that don't exist in Parts
and move them over.
But with this path, I am not creative (or experienced) enough to imagine a way to do this move / merge and to keep the self-referencing id's intact.
I've been reading threads like "How to Insert data into self reference table in sql server?" and "T-SQL - Insert Data into Parent and Child Tables" but so far I am still not seeing the vision.