0

I have target table and a staging table, for example:

create table #Target (Id int not null IDENTITY(1,1),Value int not null)
create table #Staging (TrackingId int not null, Value int not null)
insert #Staging (TrackingId, Value) values (201,1000),(202,1000),(203,2000),(204,2000)

Note that:

  • The target table has an identity column.
  • The staging table has a TrackingId column, which is not included in the target table.

I want to insert data from my staging table to my real table such that I keep the mapping from my #Staging.TrackingId to the newly created #Target.Id. My output, after the insertion, should look like this:

Id, TrackingId
 1,        201
 2,        202
 3,        203
 4,        204

(any other mapping is also possible, based on the order in which the insert occurs)

I tried two things. One is an order-ed insert, with an output clause. This looks neat but I'm not sure SQL Server guarantees the order:

insert into #Target (Value)
OUTPUT inserted.Id --shame ", s.TrackingId" isn't allowed...
select s.Value from #Staging s
order by s.TrackingId

I also tried using MERGE. This works, but seems wasteful (and a little dumb):

MERGE INTO #Target
USING #Staging AS s
    ON 1=2 --dummy condition, we know we want an insert
WHEN NOT MATCHED BY TARGET THEN
    INSERT (Value)
    VALUES (s.Value)
OUTPUT INSERTED.Id, s.TrackingId

What is the correct approach for getting the newly created identity values and keeping the relation to my original order?

Kobi
  • 135,331
  • 41
  • 252
  • 292
  • Are trackingId values unique? – Zohar Peled Feb 12 '17 at 12:51
  • @ZoharPeled - Yes, that's why I've added it. The other values are not unique (in reality, about 20 columns). – Kobi Feb 12 '17 at 12:54
  • I dont see the problem. `INSERT INTO SELECT ... ORDER BY` do the job. SQL Server doesnt guarantee the order, that is why you use `ORDER BY` after that you data is order the way you want. And will be inserted in that order – Juan Carlos Oropeza Feb 12 '17 at 12:57
  • AFAIK, Merge is the only option to get valies from both the source and the target of the insert from the output clause. So if you are not inserting the tracking id I think that's your only choise. – Zohar Peled Feb 12 '17 at 12:57
  • @JuanCarlosOropeza - The `insert` looks great, I liked that solution. But will it always work? Is that what they mean by "INSERT queries that use SELECT with ORDER BY to populate rows guarantees how identity values are computed but not the order in which the rows are inserted"? – Kobi Feb 12 '17 at 13:01
  • 1
    I don't know why you care about the order, since database tables are unsorted by nature. I've used the merge technique to insert multiple parents and children into the database before, (and I suspect this might have something to do with your requirement). You might want to look at [this post.](http://stackoverflow.com/questions/34826450/fastest-way-to-perform-nested-bulk-inserts-with-scope-identity-usage) – Zohar Peled Feb 12 '17 at 14:01
  • @Zohar - Yes, this is a similar scenario. My application has additional data for each row that is not saved in the database, and I need the ID for each item in my list. This isn't about the order in the database, but about the order in which the IDs are being generated vs. the order of my tracking value. – Kobi Feb 12 '17 at 14:04
  • In that case, I would go with the merge solution. – Zohar Peled Feb 12 '17 at 14:25

1 Answers1

1

Yes. the insert will always work, once you include the order by, the insert will be executed in that order.

Here I change the staging order, btw you dont need OUTPUT

SQL DEMO

insert #Staging (TrackingId, Value) values (201,1000),(204,2000),(203,2000),(202,1000);
                                             ^          ^          ^          ^

INSERT INTO #Target (Value <, otherfields>)
SELECT TrackingID <, otherfields>
FROM #Staging
ORDER BY TrackingID
;


SELECT *
FROM #Target;

Please read the comments below in that article the answer from the author:

  • Could you elaborate on statement #4.

Yes, the identity values will be generated in the sequence established by the ORDER BY. If a clustered index exists on the identity column, then the values will be in the logical order of the index keys. This still doesn’t guarantee physical order of insertion. Index maintenance is a different step and that could also be done in parallel for example. So you could end up generating the identity values based on ORDER BY clause and then feeding those rows to the clustered index insert operator which will perform the maintenance task. You can see this in the query plan. You should really NOT think about physical operations or order but instead think of a table as a unordered set of rows. The index can be used to sort rows in logical manner (using ORDER BY clause) efficiently.

Kobi
  • 135,331
  • 41
  • 252
  • 292
Juan Carlos Oropeza
  • 47,252
  • 12
  • 78
  • 118
  • 1
    sequence of identity values assigned with `INSERT...SELECT` is determined by `ORDER BY` as documented in the SQL Server Books Online (https://msdn.microsoft.com/en-us/library/ms174335.aspx). This is unrelated to indexes on the target table because a table is logically an unordered set of rows by definition. – Dan Guzman Feb 12 '17 at 15:23
  • @ZoharPeled The article was linked in OP question. – Juan Carlos Oropeza Feb 12 '17 at 22:32