Compare two rows and identify columns whose values are different

Question

The Situation

We have an application where we store machine settings in a SQL table. When the user changes a parameter of the machine, we create a "revision", that means we insert a row into a table. This table has about 200 columns. In our application, the user can take a look on each revision.

The Problem

We want to highlight the parameters that have changed since the last revision.

The Question

Is there an SQL-only way to get the column names of the differences between two rows?

An Example

ID | p_x | p_y | p_z
--------------------
11 | xxx | yyy | zzz

12 | xxy | yyy | zzy

The query should return p_x and p_z.

EDIT

The table has 200 columns, not rows...

MY WAY OUT

My intention was to find a "one-line-SQL-statement" for this problem.

I see in the answers below, it's kind a bigger thing in SQL. As there is no short, SQL-included solution for this problem, solving it in the backend of our software (c#) is of course much easier!

But as this is not a real "answer" to my question, I don't mark it as answered.

Thanks for the help.

It's unclear you want those column returned as rows? You want return name of column where there are differences or values of those columns? — Giorgi Nakeuri, Jan 28 '15 at 14:35
It is not at all clear what you are trying to do here. Are you wanting these values returned for every row in your table? If you could post ddl and sample data it would be a big step in the right direction. sqlfiddle.com is a great place to start. — Sean Lange, Jan 28 '15 at 15:12
@GiorgiNakeuri, I want the names of the columns that are different. — d.wing, Feb 02 '15 at 08:30
@d.wing With respect, I think your question was answered: you learned that the only solutions were complex ones. You chose to solve the problem a different way, but that's not the same as your question not being answered. — Ann L., Feb 04 '15 at 16:17
@Ann L. I agree with Ann about question was answered: moreover I would consider Weihui Guo's answer, even if complex, not so far from being a "one-line-Sql-statement". — pm., Dec 09 '20 at 09:12

Ann L. · Answer 1 · 2018-05-15T22:49:54.773

You say:

 We want to highlight the parameters that have changed since the last revision.

This implies that you want the display (or report) to make the parameters that changed stand out.

If you're going to show all the parameters anyway, it would be a lot easier to do this programmatically in the front end. It would be a much simpler problem in a programming language. Unfortunately, not knowing what your front end is, I can't give you particular recommendations.

If you really can't do it in the front end but have to receive this information in a query from the database (you did say "SQL-only"), you need to specify the format you'd like the data in. A single-column list of the columns that changed between the two records? A list of columns with a flag indicating which columns did or didn't change?

But here's one way that would work, though in the process it converts all your fields to nvarchars before it does its comparison:

Use the technique described here (disclaimer: that's my blog) to transform your records into ID-name-value pairs.

Join the resulting data set to itself on ID, so that you can compare the values and print those that have changed:

 with A as (    
--  We're going to return the product ID, plus an XML version of the     
--  entire record. 
select  ID    
 ,   (
      Select  *          
      from    myTable          
      where   ID = pp.ID                            
      for xml auto, type) as X 
from    myTable pp )
, B as (    
--  We're going to run an Xml query against the XML field, and transform it    
--  into a series of name-value pairs.  But X2 will still be a single XML    
--  field, associated with this ID.    
select  Id        
   ,   X.query(         
       'for $f in myTable/@*          
       return         
       <data  name="{ local-name($f) }" value="{ data($f) }" />      
       ') 
       as X2 from A 
)
,    C as (    
 --  We're going to run the Nodes function against the X2 field,  splitting     
 --  our list of "data" elements into individual nodes.  We will then use    
 -- the Value function to extract the name and value.   
 select B.ID as ID  
   ,   norm.data.value('@name', 'nvarchar(max)') as Name  
   ,   norm.data.value('@value', 'nvarchar(max)') as Value
from B cross apply B.X2.nodes('/myTable') as norm(data))

-- Select our results.

select *
from ( select * from C where ID = 123) C1
full outer join ( select * from C where ID = 345) C2
    on C1.Name = c2.Name
where c1.Value <> c2.Value 
  or  not (c1.Value is null and c2.Value is null)

This is great, but I'm wondering what the last line `or not (c1.Value is null and c2.Value is null)` accomplishes. What's the purpose of returning both values if they are not null? Why do we care, as long as they equal each other? — Gerald, Sep 14 '16 at 20:23
@Gerald That's there due to how SQL Server handles nulls. It's not the same as, say, C#. Anything compared to a null will always return false. — Ann L., Sep 15 '16 at 13:55
@Gerald `A = B` is always false if either `A` or `B` is null, _even if both of them are null_. `A < null` will _always_ be false, `A <> null` will be false, `Null = Null` will be false. And this is true _regardless of the value of A_. If you run `SELECT 1 WHERE Null = Null`, you won't get a record! — Ann L., Sep 15 '16 at 13:55
@Gerald So the only way to test for "both of them are null" is the way I did it: A and B are equal if `A = B or (A is null and B is null)`. — Ann L., Sep 15 '16 at 13:56
@Gerald Addendum: Some people would recommend `IsNull(A,'') = IsNull(B,'')` rather than my more complex comparison. But the problem there is that it doesn't perform very well because it makes it impossible to use any existing indexes. (Quite possibly my code, which chops up the data, can't use indexes anyway, but as a general rule, `IsNull(A,'') = IsNull(B,'')` is going to be slower than doing a separate check for null.) — Ann L., Sep 15 '16 at 14:01
Just wanted to note a few changes I had to make to make the above work for me. In the line with the cross apply I replaced B.X2.nodes('/myTable) with B.X2.nodes('data'). The closing quote is just a necessary typo. Switching to data was necessary to get any results back. I don't know if my xml nodes were supposed to be labeled with my table name but I messed something up, or what. — erosebe, Apr 18 '17 at 16:12

Weihui Guo · Answer 2 · 2020-02-03T20:23:27.387

You can use unpivot and pivot. The key is to transpose data so that you can use where [11] != [12].

WITH CTE AS (
    SELECT * 
    FROM 
    (
        SELECT ID, colName, val
        FROM tblName
        UNPIVOT
        (
            val
            FOR colName IN ([p_x],[p_y],[p_z])
        ) unpiv
    ) src
    PIVOT
    (
        MAX(val)
        FOR ID IN ([11], [12])
    ) piv
)
SELECT colName
--SELECT *
FROM CTE WHERE [11] != [12]

If there are only a few columns in the table, it's easy to simply put [p_x],[p_y],[p_z], but obviously it's not convenient to type 50 or more columns. Even though you may use this trick to drag and drop, or copy/paste, the column names from the table, it's still bulky. And for that, you may use the SELECT * EXCEPT strategy with dynamic sql.

DECLARE @TSQL NVARCHAR(MAX), @colNames NVARCHAR(MAX)
SELECT @colNames = COALESCE(@colNames + ',' ,'') + [name] 
FROM syscolumns WHERE name  <> 'ID' and id = (SELECT id FROM sysobjects WHERE name = 'tablelName')

SET @TSQL = '
    WITH CTE AS (
        SELECT * 
        FROM 
        (
            SELECT ID, colName, val
            FROM tablelName
            UNPIVOT
            (
                val
                FOR colName IN (' + @colNames + ')
            ) unpiv
        ) src
        PIVOT
        (
            MAX(val)
            FOR ID IN ([11], [12])
        ) piv
    )
    --SELECT colName
    SELECT *
    FROM CTE WHERE [11] != [12]
'
EXEC sp_executesql @TSQL

score 0 · Answer 3 · answered Jan 28 '15 at 15:11

Here's one way using UNPIVOT:

;WITH
    cte AS
    (
        SELECT      CASE WHEN t1.p_x <> t2.p_x THEN 1 ELSE 0 END As p_x,
                    CASE WHEN t1.p_y <> t2.p_y THEN 1 ELSE 0 END As p_y,
                    CASE WHEN t1.p_z <> t2.p_z THEN 1 ELSE 0 END As p_z
        FROM        MyTable t1, MyTable t2
        WHERE       t1.ID = 11 AND t2.ID = 12 -- enter the two revisions to compare here
    )

SELECT      *
FROM        cte
UNPIVOT     (
                Changed FOR ColumnName IN (p_x, p_y, p_z)
            ) upvt
WHERE       upvt.Changed = 1

You have to add code to handle NULLs during the comparisons. You can also build the query dynamically if there are lots of columns in your table.

Dudi Konfino · Answer 4 · 2015-02-02T10:07:26.050

for sql server 2012 you can do something like that (duplicate it for each column):

  SELECT iif((p_x != lead(p_x) over(ORDER BY p_x)),
                                       (SELECT COLUMN_NAME 
                                        FROM INFORMATION_SCHEMA.COLUMNS
                                        WHERE TABLE_NAME = 'tbl' 
                                        AND 
                                        TABLE_SCHEMA='schema' 
                                        AND 
                                        ORDINAL_POSITION='1')
                                                 ,NULL)
FROM tbl

for sql server 2008 try

DECLARE @x int =11  -- first id
WHILE @x!=(SELECT count(1) FROM tbl)
BEGIN --comparison of two adjacent rows
if (SELECT p_x  FROM tbl WHERE id=@x)!=(SELECT p_x  FROM tbl WHERE id=@x+1)

 BEGIN
 SELECT COLUMN_NAME
 FROM INFORMATION_SCHEMA.COLUMNS
 WHERE TABLE_NAME = 'tbl'  --insert your table
 AND 
 TABLE_SCHEMA='schema'   --insert your schema
 AND 
 ORDINAL_POSITION='1'  --first column 'p_x'
END
set @x=@x+1
END

Compare two rows and identify columns whose values are different

4 Answers4

Linked