SQL Server - Compare 2 tables for data in the same columns without checking for equality

Question

I asked this question before here, but the answer actually wasn't what I was looking for.

Suppose I have the following two tables in my SQL Server (2012) DB:

Tbl1:

ID:     Col1:     Col2:     Col3:
1       Val11     Val21     Val31
2       <NULL>    Val21     <NULL>
3       Val11     <NULL>    Val31
4       Val11     <NULL>    Val31

Tbl2:

ID:     Col1:     Col2:     Col3:
1       Val12     Val22     Val32
2       <NULL>    Val22     <NULL>
3       <NULL>    <NULL>    Val32
5       <NULL>    <NULL>    Val32

And, at this point, all I want to see is:

Any rows that are in one table but not the other (based on ID pk)
If the ID is in both tables, then are the same columns populated (not caring specifically about the values yet).

I'm just trying to come up with a SQL to just let me know which IDs have discrepancies.

My ideal output would look as follows:

Tbl1_ID:       Tbl2_Id:       Discrepancy:
1              1              0
2              2              0
3              3              1
4              <NULL>         1
<NULL>         5              1

My testing SQL so far is this:

DECLARE 
@Tbl1 TABLE (ID INT, Col1 VARCHAR(10), Col2 VARCHAR(10), Col3 VARCHAR(10))
;
DECLARE
@Tbl2 TABLE (ID INT, Col1 VARCHAR(10), Col2 VARCHAR(10), Col3 VARCHAR(10))
;

INSERT INTO @Tbl1 (ID, Col1, Col2, Col3)
VALUES
    (1, 'Val11', 'Val21', 'Val31')
    ,(2, NULL   , 'Val21', NULL)
    ,(3, 'Val11', NULL, 'Val31')
    ,(4, 'Val11', NULL, 'Val31')
;

INSERT INTO @Tbl2 (ID, Col1, Col2, Col3)
VALUES
    (1, 'Val12', 'Val22', 'Val32')
    ,(2, NULL   , 'Val22', NULL)
    ,(3, NULL, NULL, 'Val32')
    ,(5, NULL, NULL, 'Val32')
;


SELECT
     [@Tbl1].ID AS Tbl1_ID
    ,[@Tbl2].ID AS Tbl2_ID
    , -- Some kind of comparison to let me know if all columns are populated the same
     AS Discrepancy
FROM
    @Tbl1
    FULL JOIN @Tbl2
    ON [@Tbl1].ID = [@Tbl2].ID

As you can see in the previous question's answer, the solution proposed (and I didn't check it well enough) was doing an ISNULL(Tbl1.Col1, xx) = ISNULL(Tbl2.Col1, xx) kind of comparison for the columns I'm looking for. The problem there is that it is, in fact, checking the values of the two tables against each other. All I want to do is check if they are both populated or not, without any need to do a value-comparison.

I know I could accomplish this with something along the lines of CASE WHEN Tbl1.Col1 is NULL THEN 1 ELSE 0 END = CASE WHEN Tbl2.Col1 IS NULL THEN 1 ELSE 0 END, but I'm hoping there is a nicer way to do this since I am checking a lot of columns.

Is there some good way to accomplish this?

Thanks!!

score 2 · Accepted Answer · answered Aug 22 '18 at 17:51

You could use:

SELECT
     t1.ID AS Tbl1_ID
    ,t2.ID AS Tbl2_ID
    ,CASE WHEN NOT EXISTS (
                -- here use template and put column list from table 1
        SELECT CASE WHEN t1.Col1 IS NOT NULL THEN -1 ELSE 0 END, 
               CASE WHEN t1.Col2 IS NOT NULL THEN -1 ELSE 0 END, 
               CASE WHEN t1.Col3 IS NOT NULL THEN -1 ELSE 0 END 
        INTERSECT 
               -- here use template and put column list from table 2
        SELECT CASE WHEN t2.Col1 IS NOT NULL THEN -1 ELSE 0 END, 
               CASE WHEN t2.Col2 IS NOT NULL THEN -1 ELSE 0 END, 
               CASE WHEN t2.Col3 IS NOT NULL THEN -1 ELSE 0 END)
     THEN 1 ELSE 0 END AS Discrepancy
FROM @Tbl1 t1
FULL JOIN @Tbl2 t2
    ON t1.ID = t2.ID;

Rextester.com Demo

It is a variant of IS DISTINCT FROM. If you care about actual values then:

SELECT
     t1.ID AS Tbl1_ID
    ,t2.ID AS Tbl2_ID
    ,CASE WHEN NOT EXISTS (
        SELECT t1.Col1, t1.Col2, t1.Col3
        INTERSECT 
        SELECT t2.Col1, t2.Col2, t2.Col3)
     THEN 1 ELSE 0 END AS Discrepancy
FROM @Tbl1 t1
FULL JOIN @Tbl2 t2
    ON t1.ID = t2.ID;

Thanks so much - This definitely seems like the most logical solution. The thing is it seems to be just about the same as a `CASE WHEN` upon multiple `CASE WHEN Tbl1.Col1 is NULL THEN 1 ELSE 0 END = CASE WHEN Tbl2.Col1 IS NULL THEN 1 ELSE 0 END AND ....` other statements. does the `INTERSECT` have any better reasons to use over the `CASE WHEN`s? — John Bustos, Aug 23 '18 at 14:42
@JohnBustos Single intersect will handle all columns at once(second sample code with correct NULL handling). More info [How to rewrite IS DISTINCT FROM and IS NOT DISTINCT FROM?](https://stackoverflow.com/a/50280408/5070879) — Lukasz Szozda, Aug 23 '18 at 15:49

John Cappelletti · Answer 2 · 2018-08-22T18:19:32.480

You mentioned "not checking values yet".

Here is one approach which will check both PK and values (assuming the tables have identical structures)

I forced ID 1 to be identical on both tables just for illustration

Example

DECLARE @Tbl1 TABLE (ID INT, Col1 VARCHAR(10), Col2 VARCHAR(10), Col3 VARCHAR(10));
DECLARE @Tbl2 TABLE (ID INT, Col1 VARCHAR(10), Col2 VARCHAR(10), Col3 VARCHAR(10));

INSERT INTO @Tbl1 (ID, Col1, Col2, Col3) VALUES
 (1, 'Val11', 'Val21', 'Val31')
,(2, NULL   , 'Val21', NULL)
,(3, 'Val11', NULL, 'Val31')
,(4, 'Val11', NULL, 'Val31')

INSERT INTO @Tbl2 (ID, Col1, Col2, Col3) VALUES
 (1, 'Val11', 'Val21', 'Val31')
,(2, NULL   , 'Val22', NULL)
,(3, NULL, NULL, 'Val32')
,(5, NULL, NULL, 'Val32')

Select T1=max(case when Src='T1' then ID end)
      ,T2=max(case when Src='T2' then ID end)
      ,ID_Discrepancy    = case when max(case when Src='T1' then ID end)=max(case when Src='T2' then ID end) then 0 else 1 end
      ,Value_Discrepancy = case when max(case when Src='T1' then Test else '' end) = max(case when Src='T2' then Test else '' end) then 0 else 1 end
From (
      Select Src='T1',ID,Test=(Select A1.* for XML Raw) From @Tbl1 A1 
      Union All
      Select Src='T2',ID,Test=(Select A2.* for XML Raw) From @Tbl2 A2
     ) A
 Group By ID

Returns

T1      T2      ID_Discrepancy  Value_Discrepancy
1       1       0               0
2       2       0               1
3       3       0               1
4       NULL    1               1
NULL    5       1               1

never seen this kind of comparison before - Pretty interesting! Thanks. I'll get started at dissecting it now. — John Bustos, Aug 22 '18 at 18:24
@JohnBustos The xml portion just convert the row into a string for comparison. Other methods may be more performant, but if you have many columns, this is a simplified approach. — John Cappelletti, Aug 22 '18 at 18:26
@JohnBustos Just to be clear. Rather than Select A1.* ... you could Select A1.Col1,A.Col2,... if your structures are different or if you want to compare a limited number of columns. — John Cappelletti, Aug 22 '18 at 18:32

score 1 · Answer 3 · answered Aug 22 '18 at 19:18

1

Not a whole new approach, but you could shorten the code with the help of a function which returns 0 and 1 depending on whether both arguments are null or both are not null

create function discr(@a varchar, @b varchar) returns int
as 
begin
    return iif((@a is null and not @b is null) or (@b is null and not @a is null),1,0)
end

and then

select sign(
     dbo.discr(a.Id, b.Id) +
     dbo.discr(a.col1, b.col1) + 
     dbo.discr(a.col2, b.col2) + 
     dbo.discr(a.col3, b.col3))
from @tbl1 a full outer join @tbl2 b on a.Id = b.Id

answered Aug 22 '18 at 19:18

CrimsonKing

2,696
1
14
11

It's nice in that it definitely shortens the actual query, but I'm guessing the many calls to the function would slow down the performance some. I'd have to check on that and see, but definitely a nice approach in making it easier to write the final SQL and contain all the comparison logic in one place. – John Bustos Aug 23 '18 at 14:28

Zohar Peled · Answer 4 · 2018-08-22T20:10:43.373

Well, I like Lukasz's answer best, and also that John has an XML based solution (as usual), but it was a nice question that got me thinking about if I can come up with a new and creative way to solve this, (no magic strings allowed, of course, that's just cheating) - so I came up with this idea. I'll be the first one to admit it's not as elegant as other solutions - but it was fun writing:

SELECT  T1.ID As Tbl1_ID,
        T2.ID As Tbl2_ID,
        CASE WHEN 
                T1.ID = T2.ID AND 
                ISNULL(SIGN(LEN(T1.COL1)),-1) = ISNULL(SIGN(LEN(T2.COL1)),-1) AND
                ISNULL(SIGN(LEN(T1.COL2)),-1) = ISNULL(SIGN(LEN(T2.COL2)),-1) AND
                ISNULL(SIGN(LEN(T1.COL3)),-1) = ISNULL(SIGN(LEN(T2.COL3)),-1) 
        THEN
            1
        ELSE 
            0
        END

FROM @Tbl1 T1
FULL JOIN @Tbl2 T2 ON T1.ID = T2.ID

And just for fun, with no practical use - a version only using ISNULL, NULLIF, COALESCE and SIGN:

SELECT  T1.ID As Tbl1_ID,
        T2.ID As Tbl2_ID,
        ISNULL(
            NULLIF(
                SIGN(
                    COALESCE(
                        ISNULL(NULLIF(T1.ID, T2.ID), NULLIF(T2.ID, T1.ID)), -- Ids
                        NULLIF(ISNULL(SIGN(LEN(T1.COL1)),-1), ISNULL(SIGN(LEN(T2.COL1)),-1)), -- Col1
                        NULLIF(ISNULL(SIGN(LEN(T1.COL2)),-1), ISNULL(SIGN(LEN(T2.COL2)),-1)), -- Col2
                        NULLIF(ISNULL(SIGN(LEN(T1.COL3)), -1), ISNULL(SIGN(LEN(T2.COL3)),-1)), -- Col3
                    -2)
                ), 
            -1), 
        0)
FROM @Tbl1 T1
FULL JOIN @Tbl2 T2 ON T1.ID = T2.ID

SQL Server - Compare 2 tables for data in the same columns without checking for equality

4 Answers4

Linked