TSQL Comparing two Sets

Question

When two sets are given

s1 ={ a,b,c,d} s2={b,c,d,a}

(i.e)

TableA

Item
a
b
c
d

TableB

Item
b
c
d
a

How to write Sql query to display "Elements in tableA and tableB are equal". [Without using SP or UDF]

Output

Elements in TableA and TableB contains identical sets

btw, if you need to **search** for matching sets within a set *collection*, that is a different problem, and the solutions here may not help much. — Peter Radocchia, Sep 07 '10 at 20:01

score 9 · Accepted Answer · answered Sep 07 '10 at 19:11

9

Use:

SELECT CASE 
         WHEN   COUNT(*) = (SELECT COUNT(*) FROM a) 
            AND COUNT(*) = (SELECT COUNT(*) FROM b) THEN 'Elements in TableA and TableB contains identical sets'
         ELSE 'TableA and TableB do NOT contain identical sets'
       END
  FROM (SELECT a.col
          FROM a
        INTERSECT
        SELECT b.col
          FROM b) x

Test with:

WITH a AS (
  SELECT 'a' AS col
  UNION ALL
  SELECT 'b'
  UNION ALL
  SELECT 'c'
  UNION ALL
  SELECT 'd'),
     b AS (
  SELECT 'b' AS col
  UNION ALL
  SELECT 'c'
  UNION ALL
  SELECT 'd'
  UNION ALL
  SELECT 'a')
SELECT CASE 
         WHEN   COUNT(*) = (SELECT COUNT(*) FROM a) 
            AND COUNT(*) = (SELECT COUNT(*) FROM b) THEN 'yes'
         ELSE 'no'
       END
  FROM (SELECT a.col
          FROM a
        INTERSECT
        SELECT b.col
          FROM b) x

answered Sep 07 '10 at 19:11

OMG Ponies

325,700
82
523
502

1

Very interesting info on the subtree costs. – Denis Valeev Sep 07 '10 at 19:41
That's 2 reads per table + 1 join. Does no one know how to use `FULL JOIN`? – Peter Radocchia Sep 07 '10 at 19:55
@Peter: I ran your FULL JOIN option, there's an extremely wide performance margin between your FULL JOIN and using INTERSECT or EXCEPT (on SS2005 anyway). I imagine the slightly higher cost for INTERSECT (vs Nix's EXCEPT version) is due to the counts to ensure the proper msg is displayed. – OMG Ponies Sep 07 '10 at 20:08
@Emtucifier: The FULL JOIN is worse based on the subtree cost value - see respective comments to each answer for details. – OMG Ponies Sep 07 '10 at 21:31

Peter Radocchia · Answer 2 · 2010-09-07T19:47:09.920

7

Something like this, using FULL JOIN:

SELECT
  CASE 
    WHEN EXISTS (
      SELECT * FROM s1 FULL JOIN s2 ON s1.Item = s2.Item
      WHERE s1.Item IS NULL OR s2.Item IS NULL
      )
    THEN 'Elements in tableA and tableB are not equal'
    ELSE 'Elements in tableA and tableB are equal'
  END

This has the virtue of short-circuiting on the first non-match, unlike other solutions that require 2 full scans of each table (once for the COUNT(*), once for the JOIN/INTERSECT).

Estimated cost is significantly less than other solutions.

edited Sep 07 '10 at 19:47

answered Sep 07 '10 at 19:27

Peter Radocchia

10,710
2
34
56

My initial approach. I guess it's not fun enough! :) – Denis Valeev Sep 07 '10 at 19:43
Actual Subtree cost (for me) of 0.0178565 - *barely* lower than Denis's answer, slightly better alternative than cmsjr's answer ... [Consideration for cmsjr's question about SS2000 alternatives](http://msdn.microsoft.com/en-us/library/aa259187%28SQL.80%29.aspx). – OMG Ponies Sep 07 '10 at 20:05
Bah, late to the party, FULL JOIN was going to be my answer, too. – ErikE Sep 07 '10 at 21:05
@Emtucifor: and what a party it was! Must be the I-need-a-distraction time of day. – Peter Radocchia Sep 07 '10 at 21:14

Denis Valeev · Answer 3 · 2010-09-07T19:35:35.657

3

My monstrocity:

;with SetA as
(select 'a' c union
select 'b' union
select 'c') 
, SetB as 
(select 'b' c union
select 'c' union
select 'a' union 
select 'd'
) 
select case (select count(*) from (
select * from SetA except select * from SetB
union 
select * from SetB except select * from SetA
)t)
when 0 then 'Equal' else 'NotEqual' end 'Equality'

edited Sep 07 '10 at 19:35

answered Sep 07 '10 at 19:07

Denis Valeev

5,975
35
41

Subtree cost of 0.0178567 - a hair better than cmsjr's answer – OMG Ponies Sep 07 '10 at 19:31
My AEP says that the STC is 0.01129. But then again, I just ran the query as a whole, without preparing those sets beforehand. – Denis Valeev Sep 07 '10 at 19:46
I run two passes, to check the query is returning valid output for false & positive samples. I wouldn't post the cost if the query doesn't pass prelim testing. – OMG Ponies Sep 07 '10 at 20:11

score 3 · Answer 4 · answered Sep 07 '10 at 19:13

3

Watch out, I'm gonna use a Cross Join.

Declare @t1 table(val varchar(20))
Declare @t2 table(val varchar(20))


insert into @t1 values ('a')
insert into @t1 values ('b')
insert into @t1 values ('c')
insert into @t1 values ('d')


insert into @t2 values ('c')
insert into @t2 values ('d')
insert into @t2 values ('b')
insert into @t2 values ('a')

select 
    case when 
    count(1) = 
    (((Select count(1) from @t1) 
    + (Select count(1) from @t2)) / 2.0) 
    then 1 else 0 end as SetsMatch  from 
@t1 t1 cross join @t2 t2 
where t1.val = t2.val

answered Sep 07 '10 at 19:13

cmsjr

56,771
11
70
62

Subtree cost of 0.0178713 - the `INTERSECT` & `EXCEPT` alternatives perform more efficiently. – OMG Ponies Sep 07 '10 at 19:29
looks like an inner join to me. – Peter Radocchia Sep 07 '10 at 19:29
@cmsjr: [CROSS JOIN is supported on SQL Server 2000](http://msdn.microsoft.com/en-us/library/aa259187%28SQL.80%29.aspx), so I guess you mean about how [INTERSECT and EXCEPT](http://msdn.microsoft.com/en-us/library/ms188055.aspx) are 2005+ functionality? Upgrade =) – OMG Ponies Sep 07 '10 at 19:35
1

@Peter I have no idea what you're talking about. – cmsjr Sep 07 '10 at 19:38
@OMG, I clearly deleted that comment and I don't appreciate you replying to it. My new comment was going to be as follows. – cmsjr Sep 07 '10 at 19:39
1

OMG @OMG, where is your sense of fun. – cmsjr Sep 07 '10 at 19:39
@cmsjr: Sorry, I was being cheeky. I don't see an issue with using the cross join on SS2K, but the performance difference is immense when using 2005+ functionality. – OMG Ponies Sep 07 '10 at 19:43
I was being cheeky too, no offense taken (or I hope given) at any point. The solution is clearly not performant, it just seemed like a valid if inefficient opportunity to whip out the ol' cross join. – cmsjr Sep 07 '10 at 19:48
1

`a cross join b where a.val = b.val` is equivalent to `a inner join b on a.val = b.val`, and will be rewritten as an inner join by the optimizer. check the execution plans, they will be identical. – Peter Radocchia Sep 07 '10 at 19:52
That they are. Same execution plan if you drop the join altogether too. Once again, cross join antics are just a bust... – cmsjr Sep 07 '10 at 19:59
@cmsjr: Peter's answer is slightly faster, and 2000 compliant - best SS2K candidate so far. – OMG Ponies Sep 07 '10 at 20:06
Thanks for the heads up, have up-voted in his general direction (and yours) – cmsjr Sep 07 '10 at 20:28
That isn't actually a cross join as you know, and using the words CROSS JOIN when you don't intend one is unnecessary obfuscation. – ErikE Sep 07 '10 at 21:07
wow, uh thanks for reading all the comments before adding your own. – cmsjr Sep 07 '10 at 21:12
Well then how about this angle, since all the variants reduce to the same execution plan, and my actual intent is to compare all values to all other values, doesn't the cross join more explicitly express that intent better than the inner join or the absence of a join operator? – cmsjr Sep 07 '10 at 21:32

user1978162 · Answer 5 · 2013-01-17T18:25:48.930

Since this thread was very helpful to me, I thought I'd share my solution.

I had a similar problem, perhaps more generally applicable than this specific single-set comparison. I was trying to find the id of an element that had a set of multi-element child elements that matched a query set of multi-element items.

The relevant schema information is:

table events, pk id
table solutions, pk id, fk event_id -> events
table solution_sources, fk solutionid -> solutions
   columns unitsourceid, alpha

Query: find the solution for event with id 110 that has the set of solution_sources that match the set of (unitsourceid, alpha) in ss_tmp. (This can also be done without the tmp table, I believe.)

Solution:

with solutionids as (
  select y.solutionid from (
     select ss.solutionid, count(ss.solutionid) x 
        from solutions s, solution_sources ss 
        where s.event_id = 110 and ss.solutionid = s.id 
        group by ss.solutionid
  ) y where y.x = ( select count(*) from ss_tmp )
) 
select solutionids.solutionid  from solutionids where
(
select case
   when count(*) = ( select count(*) from ss_tmp ) then true
   else false
   end
    from 
       ( SELECT unitsourceid, alpha FROM solution_sources 
            where solutionid = solutionids.solutionid
          INTERSECT
         SELECT unitsourceid, alpha FROM ss_tmp ) x
)

Tested against a test query of 4 items and a test db that had a matching solution (same number of child elements, each that matched), several completely non-matching solutions, and 1 solution that had 3 matching child elements, 1 solution that had all 4 matching child elements, plus an additional child, and 1 solution that had 4 child elements of which 3 of the 4 matched the query. Only the id of the true match was returned.

thanks a lot -Linus

score 1 · Answer 6 · answered Sep 07 '10 at 19:08

1

Could do it with EXCEPT and a case

select 
   case 
     when count (1)=0 
        then 'Elements in TableA and TableB contains identical sets' 
     else 'Nope' end from (
       select item from s1
      EXCEPT 
       select item from s2
) b

answered Sep 07 '10 at 19:08

Nix

57,072
29
149
198

2

s2 could have more items than s1, but this query would claim they are identical. – Peter Radocchia Sep 07 '10 at 19:20
Some would argue the sets are still the same, you could then assert the counts if u wanted, or u could do row checksums if you are really bored. – Nix Sep 07 '10 at 19:21
+1: On SQL Server 2005, this query has a better subtree cost of 0.0000459 vs my version using INTERSECT (0.0000702). I imagine my versions' higher cost is due to the COUNT comparisons. – OMG Ponies Sep 07 '10 at 19:26
1

This query is incorrect. If s2 has all the rows to match s1 but has additional nonmatching rows, this query will still return that they have identical sets. – ErikE Sep 07 '10 at 21:09
-1 -- I agree with @Peter and @Emtucifor that this gives false positives. – onedaywhen Sep 09 '10 at 09:45

Thomas Phaneuf · Answer 7 · 2022-03-27T02:34:50.357

Use EXCEPT statement

When using the EXCEPT statement to test if two sets contain the same rows, you will need to do the EXCEPT in both directions (A EXCEPT B and B EXCEPT A). If either comparison returns any records, then the sets are different. If no records are returned by either, they are the same.

The nice thing about this is that you can do this comparison with any number of specific columns and NULL values are handled implicitly without having to jump through hoops to compare them.

A good use case for this is verifying that saving a set of records happened correctly, especially when affecting an existing set.

SELECT  IsMatching = (1 ^ convert(bit, count(*)))
FROM    (
        SELECT  Mismatched = 1     -- Can be any column name
        FROM    (
                    SELECT  Item   -- Can have additional columns
                    FROM    TableA
                    EXCEPT
                    SELECT  Item   -- Can have additional columns
                    FROM    TableB
                ) as A
        UNION
        SELECT  Mismatched = 1     -- Can be any column name
        FROM    (
                    SELECT  Item   -- Can have additional columns 
                    FROM    TableB
                    EXCEPT
                    SELECT  Item   -- Can have additional columns
                    FROM    TableA
                ) as A
        ) as A

TSQL Comparing two Sets

7 Answers7

Linked