9

I have a statement that looks something like this:

MERGE INTO someTable st
USING
(
    SELECT id,field1,field2,etc FROM otherTable
) ot on st.field1=ot.field1
WHEN NOT MATCHED THEN
    INSERT (field1,field2,etc)
    VALUES (ot.field1,ot.field2,ot.etc)

where otherTable has an autoincrementing id field.

I would like the insertion into someTable to be in the same order as the id field of otherTable, such that the order of ids is preserved when the non-matching fields are inserted.

A quick look at the docs would appear to suggest that there is no feature to support this.

Is this possible, or is there another way to do the insertion that would fulfil my requirements?

EDIT: One approach to this would be to add an additional field to someTable that captures the ordering. I'd rather not do this if possible.

... upon reflection the approach above seems like the way to go.

GEOCHET
  • 21,119
  • 15
  • 74
  • 98
spender
  • 117,338
  • 33
  • 229
  • 351

3 Answers3

24

I cannot speak to what the Questioner is asking for here because it doesn't make any sense.

So let's assume a different problem:

Let's say, instead, that I have a Heap-Table with no Identity-Field, but it does have a "Visited" Date field.
The Heap-Table logs Person WebPage Visits and I'm loading it into my Data Warehouse.
In this Data Warehouse I'd like to use the Surrogate-Key "WebHitID" to reference these relationships.
Let's use Merge to do the initial load of the table, then continue calling it to keep the tables in sync.

I know that if I'm inserting records into an table, then I'd prefer the ID's (that are being generated by an Identify-Field) to be sequential based on whatever Order-By I choose (let's say the "Visited" Date).
It is not uncommon to expect an Integer-ID to correlate to when it was created relative to the rest of the records in the table.
I know this is not always 100% the case, but humor me for a moment.

This is possible with Merge.

Using (what feels like a hack) TOP will allow for Sorting in our Insert:

MERGE DW.dbo.WebHit AS Target --This table as an Identity Field called WebHitID.
USING
(
    SELECT TOP 9223372036854775807 --Biggest BigInt (to be safe).
           PWV.PersonID, PWV.WebPageID, PWV.Visited
      FROM ProdDB.dbo.Person_WebPage_Visit AS PWV
     ORDER BY PWV.Visited --Works only with TOP when inside a MERGE statement.
) AS Source
  ON Source.PersonID  = Target.PersonID
 AND Source.WebPageID = Target.WebPageID
 AND Source.Visited   = Target.Visited
WHEN NOT MATCHED BY Target THEN --Not in Target-Table, but in Source-Table.
    INSERT (PersonID, WebPageID, Visited) --This Insert populates our WebHitID.
    VALUES (Source.PersonID, Source.WebPageID, Source.Visited)
WHEN NOT MATCHED BY Source THEN --In Target-Table, but not in Source-Table.
    DELETE --In case our WebHit log in Prod is archived/trimmed to save space.
;


You can see I opted to use TOP 9223372036854775807 (the biggest Integer there is) to pull everything.
If you have the resources to merge more than that, then you should be chunking it out.
While this screams "hacky workaround" to me, it should get you where you need to go.

I have tested this on a small sample set and verified it works. I have not studied the performance impact of it on larger complex sets of data though, so YMMV with and without the TOP.

MikeTeeVee
  • 18,543
  • 7
  • 76
  • 70
  • 1
    This worked, I really wish there was a better way of achieving insert order when using merges in 2022. – JARRRRG Apr 06 '22 at 09:31
  • @JARRRRG Merge (in my opinion) is a novelty. You are better off writing separate DML Operations (Insert/Update/Delete). Merge has caused me far too much trouble in the past; all for the novelty of a different syntax. There is no performance benefit to it (in fact, it has worse performance) and I find it is easier to make mistakes when implementing Merge, so I avoid it altogether. – MikeTeeVee Apr 07 '22 at 17:01
  • I use merges to seed data in to tables for records that dont exist. Im not sure how easy this would be to achieve with just an insert. These scripts need to be able to run multiple times against the db and only insert a record if it doesnt exist. I could do if statements to see if a record doesnt exist but to do this for say 100 records would be impractical. – JARRRRG Apr 07 '22 at 17:06
  • @JARRRRG As a best-practice of mine, I never use `IF` Statements to skip my DML operations. I learned long ago that when you use an `IF` to branch your DML, when the plan is cached (i.e. for a Sproc) and the IF-Block was skipped, then the next time the Sproc is run, it may perform poorly when applying the first cached execution plan (that missed the IF-Block) when running the DML in the IF-Block if it is not skipped the next time around. Because of this, I just add logic to my Where-Clause to skip the DML if conditions are not right. It's fine if you Insert/Update/Delete Zero (0) Records. – MikeTeeVee Apr 07 '22 at 19:17
  • 1
    I put together a quick fiddle to play about with what you said. I guess yeah, Merge isn't really any more 'readable' than just separate Insert / Updates. https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=31604387b3ca90db22f4afd6467a9e86 – JARRRRG Apr 07 '22 at 19:59
10

Following up on MikeTeeVee's answer.

Using TOP will allow you to Order By within a sub-query, however instead of TOP 9223372036854775807, I would go with

SELECT TOP 100 PERCENT 

Unlikely to reach that number, but this way just makes more sense and looks cleaner.

Hitesh
  • 3,449
  • 8
  • 39
  • 57
MMartin
  • 172
  • 2
  • 10
  • 6
    `SELECT TOP 100 PERCENT` does NOT work for me in this scenario, it leaves **unsorted** order. Perhpas it is caught as special case and bypassed by optimizer. If I replace it by `SELECT TOP 9223372036854775807`, sorting is immediately in effect. SQL Server 2008 R2. – miroxlav Feb 19 '16 at 14:39
  • 4
    **See also:** MSDN blog article [TOP 100 Percent ORDER BY Considered Harmful](https://blogs.msdn.microsoft.com/queryoptteam/2006/03/24/top-100-percent-order-by-considered-harmful/) – miroxlav Mar 26 '16 at 17:04
-4

Why would you care about the order of the ids matching? What difference would that make to how you query the data? Related tables should be connected through primary and foreign keys, not order records were inserted. Tables are not inherently ordered a particular way in databases. Order should come from the order by clause.

More explanation as to why you want to do this might help us steer you to an appropriate solution.

HLGEM
  • 94,695
  • 15
  • 113
  • 186
  • 3
    +1 If you do care, there's something about the data that you're trying to capture that matters to you - so do it properly. It sounds like the data is telling you it wants an explicit order column. – Carl Manaster May 19 '09 at 18:11
  • 3
    The data that I am trying to insert is chronologically ordered, but does not have any explicit fields to indicate this ordering... however, the ordering in otherTable when sorted by id is chronologically correct. Thinking about your answer makes me wonder whether my schema could do with an update. You are correct that relying upon the id field is probably a dodgy approach. – spender May 19 '09 at 18:13
  • 5
    While your answer is academically correct, there are cases when you cannot afford changing the schema etc. Or see [MikeTeeVee's answer](http://stackoverflow.com/a/23058548/2392157). Then you want a simple hint how to bypass that academic principle and enforce ordering in merging operation. Your answer lacks these details, therefore `-1`. – miroxlav Feb 19 '16 at 14:52
  • 1
    @miroxlav, I did not include those details because what the op was asking for is a bad thing that under no circumstances should be done. It is not academic, it is real life. What he wanted can never assure that things are correctly matched. It is irresponsible to hack up something that might be right but can't be guaranteed. It is a matter of data integrity. Suppose one record got rejected in inserting to the second table, then every other match from there on down the line is incorrect. Suppose two records got inserted simultaneously? – HLGEM Feb 19 '16 at 15:20
  • 3
    "Why would you care about the order of the ids matching?" is not an answer to the question of "How do I make sure inserts happen in the correct order using a merge?" – Nick Aug 28 '17 at 17:16
  • It is because you can't guarantee the order that ids were inserted into the table, even auto-generated ones. Sometimes the best recommendation is not what the person asked for originally. – HLGEM Aug 28 '17 at 18:23