5

I have a table, horribly designed (not my doing thankfully), that stores data in a fashion similar to the following:

[key], [lease_id], [building_name], ~20 more columns of data

A lease_id can and will exist for a centre as well as head office. I've been asked to find all instances where data in a building for a lease doesn't match data in head office for the same lease.

I can do this, quite easily, with a self join. The challenge here is that there are about 20 columns to compare and although I could type each one in manually I was wondering if there's a better way to do this (which would also mean the query can be used in future, accounting for any table changes).

In syntaxtically ridiculous psuedo code- I want to do something similar to what the following would do if it were to work:

select  lp.*
from    lease_proposal lp
        inner join
        (
            select  *
            from    lease_proposal lp2
            where   building_id = '001' -- assume 001 is head office for sake of example
        ) lp2
            on lp2.lease_id = lp.lease_id
where   lp.* <> lp2.*
Michael A
  • 9,480
  • 22
  • 70
  • 114

1 Answers1

5

You could do an INTERSECT operation to find all rows where all data matched, then LEFT JOIN that result and select only the rows where there wasn't an intersection:

SELECT
    a.*
FROM
    lease_proposal a
LEFT JOIN
    (
        SELECT *
        FROM lease_proposal

        INTERSECT

        SELECT *
        FROM lease_proposal
        WHERE building_id = 001
    ) b ON a.lease_id = b.lease_id
WHERE
    b.lease_id IS NULL

If SQL Server supported it, you could also use a NATURAL LEFT JOIN like so:

SELECT  
    a.*
FROM
    lease_proposal a
NATURAL LEFT JOIN
    (
        SELECT *
        FROM lease_proposal
        WHERE building_id = 001
    ) b
WHERE b.lease_id IS NULL
Zane Bien
  • 22,685
  • 6
  • 45
  • 57
  • This isn't available in SQL Server, see: http://stackoverflow.com/questions/4826613/natural-join-in-sql-server – Michael A Jul 05 '12 at 04:57
  • This pushed my knowledge up a notch - really great post, thank-you – Michael A Jul 05 '12 at 05:43
  • 2
    Leaving `SELECT * FROM lease_proposal INTERSECT` out of the query would yield the same results. `INTERSECT` merely suppresses duplicate rows where `building_id = '001'`, if there are any. You are then using the resulting subset to find rows not matching it on another criteria, and it wouldn't signify if the subset contained duplicates or not, because you would still be wanting rows that *aren't* in the subset. – Andriy M Jul 05 '12 at 09:05
  • @AndriyM, argh!! You are indeed correct, and both the above queries would not return the desired results. You would need a way to exclude the `building_id` column from `*`, and I can't come up with any elegant way to do that. I wish there was a way to do something along the lines of `SELECT * EXCEPT building_id` but there isn't. @Michael, you may have to incorporate one of the clunky solutions presented [here](http://stackoverflow.com/questions/729197/sql-exclude-a-column-using-select-except-columna-from-tablea) into your query. – Zane Bien Jul 05 '12 at 10:26
  • @Michael: At this point, the only thing I can recommend is defining a view which manually selects or joins on all columns but the `building_id` and use that view throughout your application. That way, you at least have a single point of entry, and if you need to add/update/remove one of the columns in the underlying table, you only need to change the column selection in one view rather than in many of the same queries that may be scattered in multiple places throughout your system. Problems like yours are the reason why database design is so important!! – Zane Bien Jul 05 '12 at 10:29