0

If this query:

SELECT CONCAT(SOURCE.OrderNo, '_', SOURCE.OrderLine), 
       SOURCE.ProdOrder, 
       SOURCE.Lvl1, 
       SOURCE.Lvl2, 
       SOURCE.Lvl3, 
       SOURCE.LastDate 
FROM   dbo.SourceTbl AS SOURCE 

returns 11 records and this query:

SELECT CONCAT(TARGET.OrderNo, '_', TARGET.OrderLine), 
       TARGET.ProdOrder, 
       TARGET.Lvl1, 
       TARGET.Lvl2, 
       TARGET.Lvl3, 
       TARGET.LastDate 
FROM   dbo.TargetTbl AS TARGET 

returns 17 records and the INTERSECT between the two:

SELECT CONCAT(SOURCE.OrderNo, '_', SOURCE.OrderLine), 
       SOURCE.ProdOrder, 
       SOURCE.Lvl1, 
       SOURCE.Lvl2, 
       SOURCE.Lvl3, 
       SOURCE.LastDate 
FROM   dbo.SourceTbl AS SOURCE 
INTERSECT 
SELECT CONCAT(TARGET.OrderNo, '_', TARGET.OrderLine), 
       TARGET.ProdOrder, 
       TARGET.Lvl1, 
       TARGET.Lvl2, 
       TARGET.Lvl3, 
       TARGET.LastDate 
FROM   dbo.TargetTbl AS TARGET 

returns 9 records, when I do a MERGE like this:

MERGE dbo.TargetTbl AS TARGET
USING (
       SELECT   OrderNo, OrderLine, CONCAT(OrderNo, '_', OrderLine) AS OrderNoLine, SomeModel, ProdOrder, Lvl1, Lvl2, Lvl3,
                MAX(LastDate) AS LastDate
       FROM dbo.SourceTbl
       GROUP BY OrderNo, OrderLine, CONCAT(OrderNo, '_', OrderLine), SomeModel, ProdOrder, Lvl1, Lvl2, Lvl3
      ) AS SOURCE 
      ON CONCAT(TARGET.OrderNo, '_', TARGET.OrderLine) = OrderNoLine 
         AND TARGET.ProdOrder = SOURCE.ProdOrder
         AND TARGET.Lvl1 = SOURCE.Lvl1
         AND TARGET.Lvl2 = SOURCE.Lvl2
         AND TARGET.Lvl3 = SOURCE.Lvl3  
         AND TARGET.LastDate = SOURCE.LastDate
WHEN MATCHED AND EXISTS (SELECT CONCAT(SOURCE.OrderNo, '_', SOURCE.OrderLine)
                               ,SOURCE.ProdOrder
                               ,SOURCE.Lvl1
                               ,SOURCE.Lvl2
                               ,SOURCE.Lvl3 
                               ,SOURCE.LastDate
                         INTERSECT 
                         SELECT CONCAT(TARGET.OrderNo, '_', TARGET.OrderLine)
                               ,TARGET.ProdOrder
                               ,TARGET.Lvl1
                               ,TARGET.Lvl2
                               ,TARGET.Lvl3
                               ,TARGET.LastDate
                        )
THEN UPDATE SET TARGET.IsBlocked = 1, TARGET.BlockDate = GETDATE()
WHEN NOT MATCHED BY TARGET 
THEN INSERT (LastDate, UsrID, DepID, OrderNo, OrderLine, SomeModel, ProdOrder, Lvl1, Lvl2, Lvl3, IsBlocked, BlockDate)
     VALUES (SOURCE.LastDate, 999, 999, SOURCE.OrderNo, SOURCE.OrderLine, SOURCE.SomeModel, SOURCE.ProdOrder, SOURCE.Lvl1, SOURCE.Lvl2, SOURCE.Lvl3, 1, GETDATE());

it should, according to this and this, UPDATE the 9 INTERSECT records of the TargetTbl and INSERT to that same table the remaining 2 records from the SourceTbl (11 in total). Instead it updates 4 records and inserts 6 records (10 in total). Two records in the SourceTbl are duplicated and that's the reason of 10 instead of 11 and that's also why I used the MAX & GROUP BY.

I think it's the first part of the query, the USING part, that cannot handle NULLs correctly even if the INTERSECT part does its job. I tried everything I was able to, but no success. I'm sure it's something easily doable, so please, help me. Thank you.

EDIT: SourceTbl data by using SELECT OrderNo, OrderLine, CONCAT(OrderNo, '_', OrderLine) AS OrderNoLine, SomeModel, ProdOrder, Lvl1, Lvl2, Lvl3, LastDate AS LastDate FROM dbo.SourceTbl ORDER BY OrderNo, OrderLine, SomeModel, ProdOrder, irrelevant columns are omitted:

OrderNo OrderLine   OrderNoLine SomeModel   ProdOrder   Lvl1    Lvl2    Lvl3    LastDate
123c08637   10  123c08637_10    4321525175_004321   A5C008837   Abcd    Efgh    Olol    04/03/2030
123c11214   10  123c11214_10    4321532622_000391   NULL    NULL    NULL    NULL    07/07/2018
123c13039   10  123c13039_10    4321525175_002611   A5C014838   NULL    NULL    NULL    18/05/2018
123c16059   10  123c16059_10    4321541488_001111   A5C018611   NULL    NULL    NULL    18/05/2018
123c17482   10  123c17482_10    4321506480_001711   A5C019227   Asdf    Ghjk    Cvnm    12/12/2018
123c17482   10  123c17482_10    4321506480_001711   A5C047712   Asdf    Ghjk    Cvnm    12/12/2018
123c17482   20  123c17482_20    4321506480_001712   A5B072554   aaaa    bbbb    cccc    18/05/2018
123c17482   20  123c17482_20    4321506480_001712   A5B072554   aaaa    bbbb    cccc    18/05/2018
123c17482   20  123c17482_20    4321506480_001712   A5B072554   aaaa    bbbb    xxxx    18/05/2018
123c17482   20  123c17482_20    4321506480_001712   A5B200472   NULL    NULL    NULL    18/05/2018
123c32405   10  123c32405_10    8765525667_005301   NULL    Qwer    Uiop    Tygh    12/12/2018
Community
  • 1
  • 1
Tomo
  • 429
  • 1
  • 10
  • 24
  • If you added a script that included sample data that would make things easier to discuss and understand. But I think your idea of "handling nulls" is misguided. NULL cannot be compared to anything to return a boolean (true/false) value. It returns . Attempting to join (which is how matching occurs) using null values will simply not work. (to be continued) – SMor Sep 10 '18 at 15:45
  • In addition, your update logic seems suspicous given the aggregation and the intersection. I suggest you break this down into simpler steps. First attempt the update part alone using a standard update statement. – SMor Sep 10 '18 at 15:50

3 Answers3

1

The GROUP BY might reduce the number of records to only one (if the 11 records only differ in the LastDate column and if SomeModel contains the same value for all 11 records) or it might result in all 11 records (if SomeModel contains unique values), so that GROUP BY does not neccessarily resturn the 10 distinct rows. To achieve this, use SELECT DISTINCT instead of grouping by a subset of the columns.

Also, if the ON condition worked as you seem to expect it, the additional EXISTS condition was obsolete. Obviously, 4 matches are found and 6 records have no match. Within these 6, there could be 2 records that indeed have no match and 4 records that do not match because of NULL values.

To take care of NULL values, I suggest to change the whole statement to something like this:

MERGE dbo.TargetTbl AS TARGET
USING (
       SELECT DISTINCT OrderNo, OrderLine, ProdOrder, Lvl1, Lvl2, Lvl3, LastDate
       FROM dbo.SourceTbl
      ) AS SOURCE 
      ON     (TARGET.OrderNo = SOURCE.OrderNo OR TARGET.OrderNo IS NULL AND SOURCE.OrderNo IS NULL)
         AND (TARGET.OrderLine = SOURCE.OrderLine OR TARGET.OrderLine IS NULL AND SOURCE.OrderLine IS NULL)
         AND (TARGET.ProdOrder = SOURCE.ProdOrder OR TARGET.ProdOrder IS NULL AND SOURCE.ProdOrder IS NULL)
         AND (TARGET.Lvl1 = SOURCE.Lvl1 OR TARGET.Lvl1 IS NULL AND SOURCE.Lvl1 IS NULL)
         AND (TARGET.Lvl2 = SOURCE.Lvl2 OR TARGET.Lvl2 IS NULL AND SOURCE.Lvl2 IS NULL)
         AND (TARGET.Lvl3 = SOURCE.Lvl3 OR TARGET.Lvl3 IS NULL AND SOURCE.Lvl3 IS NULL)
         AND (TARGET.LastDate = SOURCE.LastDate OR TARGET.LastDate IS NULL AND SOURCE.LastDate IS NULL)
WHEN MATCHED 
THEN UPDATE SET TARGET.IsBlocked = 1, TARGET.BlockDate = GETDATE()
WHEN NOT MATCHED BY TARGET 
THEN INSERT (LastDate, UsrID, DepID, OrderNo, OrderLine, SomeModel, ProdOrder, Lvl1, Lvl2, Lvl3, IsBlocked, BlockDate)
     VALUES (LastDate, 999, 999, OrderNo, OrderLine, SomeModel, ProdOrder, Lvl1, Lvl2, Lvl3, 1, GETDATE());
Wolfgang Kais
  • 4,010
  • 2
  • 10
  • 17
  • Right after asking here I figured it out, the solution was in front of my https://stackoverflow.com/questions/4509722/nulls-and-the-merge-statement-i-need-to-set-a-value-to-infinity-how nose - 2nd answer, which is the same as yours, so I'll mark it as answer. – Tomo Sep 11 '18 at 08:17
  • Changing GROUP BY with DISTINCT produces the same result. Both of them update all the duplicates. It's not clear to me what you tried to explain. – Tomo Sep 11 '18 at 08:22
  • Your GROUP BY and my DISTINCT may produce the same result in your specific scenario, but this is not necessarily so, because you do not group by *LastDate*. I wasn't able to see that both produce the same result, because you didn't provide the data. The main thing to change is the matching condition. – Wolfgang Kais Sep 11 '18 at 08:27
  • I had to use MAX & GROUP by because of this error: The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows. – Tomo Sep 11 '18 at 08:40
  • Yes, that's understood, but if the source records only differed in the *LastDate* column (and one is duplicated), your GROUP BY would produce only one row (the one with the maximum *LastDate* value). DISTINCT would produce the 10 unique rows. – Wolfgang Kais Sep 11 '18 at 09:47
  • Added the SourceTbl data to the bottom of the question. – Tomo Sep 11 '18 at 09:48
  • You mean if I have let's say 5 identical records where the only difference is the LastDate? That shouldn't happen, but it can happen because of the end user's human mistake. In this case I only need the record with the highest LastDate to be used in the UPDATE or INSERT part of query. – Tomo Sep 11 '18 at 10:22
  • Thanks for posting the data. So 5 rows can never match because of NULL values, the other 5 (6 reduced by 1 because of the duplicate) could, but obviously only 4 of them do, so there are in total 5+1 = 6 non-matching rows. – Wolfgang Kais Sep 11 '18 at 10:22
  • I might have changed a few records in the tables in order to cover all possible cases, but the initial records are the same as in the question (11, 17, 9). The corrected MERGE updates 11 records (9+2 duplicates) and inserts 1 new record. After that the records are 11, 18, 10 - seems ok to me. – Tomo Sep 11 '18 at 10:38
1

Some features of the SQL language use a concept of distinctness (notably, DISTINCT and GROUP BY) where it's notable that NULL IS NOT DISTINCT FROM NULL is true. This is also surfaced in UNION (ALL), EXCEPT, INTERSECT, etc.

Unfortunately, SQL Server does not yet implement the IS (NOT) DISTINCT FROM operator from standard SQL, in and of iteself; And so you're left using equality comparisons, where famously in SQL, NULL = NULL is unknown (not true or false). So you have to explicitly perform NULL checks in your ON clause (until a future version of SQL Server supports the DISTINCT FROM operator)

Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
0

Use INTERSECT to compare NULLs properly:

MERGE dbo.TargetTbl AS TARGET
USING 
(   SELECT DISTINCT OrderNo, OrderLine, ProdOrder, Lvl1, Lvl2, Lvl3, LastDate 
    FROM dbo.SourceTbl
) AS SOURCE ON EXISTS
    (  SELECT TARGET.OrderNo, TARGET.OrderLine, TARGET.ProdOrder
         , TARGET.Lvl1, TARGET.Lvl2, TARGET.Lvl3, TARGET.LastDate 
       INTERSECT
       SELECT SOURCE.OrderNo, SOURCE.OrderLine, SOURCE.ProdOrder
         , SOURCE.Lvl1, SOURCE.Lvl2, SOURCE.Lvl3, SOURCE.LastDate
    )
     
Eric
  • 81
  • 4