TSql equality on groups of rows

Question

I have a table that contains information on groups. There can be any number of members in a group. There is a group identifier and then an element identifier. I want to be able to in a single statement determine whether or not a given set exists in the table

@groupTable is an example of the data that already exists in the database

@inputData is the data that I want to see if it already exists in @groupTable

declare @groupData table
(
    groupIdentifier int,
    elementIdentifier uniqueidentifier
)

insert into @groupData values
(1, 'dfce40b1-3719-4e4c-acfa-65f728677700'),
(1, '89e7e6be-cee8-40a7-8135-a54659e0d88c')

declare @inputData table
(
    tempGroupIdentifier int,
    elementIdentifier uniqueidentifier
)

insert into @inputData values
(42, 'dfce40b1-3719-4e4c-acfa-65f728677700'),
(42, '89e7e6be-cee8-40a7-8135-a54659e0d88c'),
(55, 'dfce40b1-3719-4e4c-acfa-65f728677700'),
(55, '2395a42c-94f4-4cda-a773-221b26ea5e44'),
(55, 'f22db9df-a1f4-4078-b74c-90e34376eff6')

Now I want to run a query that will show the relationship of the sets, showing which groupIdentifier is associated with which tempGroupIdentifier. If there is no matching set then I need to know that too.

desired output:
groupIdentifier, tempGroupIdentifier
1, 42
null, 55

Does anyone any suggestions on how to approach this problem?

I could probably pivot the rows and concat all elementIdentifiers into a giant string for each group that then do equality on, but that doesn't seem like a good solution.

If you got both as tables (real and temp), wouldn't a left join do the job? or am I missing the real problem here. — Syska, Nov 09 '11 at 19:07
In your sample data, the value 'dfce40b1-3719-4e4c-acfa-65f728677700' exists twice in your @inputdata table; was that intentional? — Stuart Ainsworth, Nov 09 '11 at 19:09
@BrandonAGr: I assume you want to match complete sets? Or is it "@groupData set" is part of an "@inputData set" — gbn, Nov 09 '11 at 19:11
Is an error or by design that @inputData got duplicate guids for different tempGroupIdentifier ? — Syska, Nov 09 '11 at 19:13
@StuartAinsworth yes the duplicate value is intentional, to show that as long as there is one item in the group different then it is a completely different group — BrandonAGr, Nov 09 '11 at 19:47
@gbn yes I want to match complete sets. There could be a ton of other groups in groupData, but to make the example simple I only showed one group in it. — BrandonAGr, Nov 09 '11 at 19:48

gbn · Answer 1 · 2011-11-09T20:02:31.923

3

SELECT DISTINCT
    T1.tempgroupIdentifier, T2.GroupIdentifier
FROM
    (
    SELECT
        COUNT(*) OVER (PARTITION BY tempgroupIdentifier) AS GroupCount,
        ROW_NUMBER() OVER (PARTITION BY tempgroupIdentifier ORDER BY elementIdentifier) AS GroupRN,
        tempgroupIdentifier, elementIdentifier
    FROM
        @inputData
    ) T1
    LEFT JOIN
    (
    SELECT
        COUNT(*) OVER (PARTITION BY GroupIdentifier) AS GroupCount,
        ROW_NUMBER() OVER (PARTITION BY GroupIdentifier ORDER BY elementIdentifier) AS GroupRN,
        GroupIdentifier, elementIdentifier
    FROM
        @groupData
    ) T2 ON T1.elementIdentifier = T2.elementIdentifier AND 
                      T1.GroupCount = T2.GroupCount AND 
                      T1.GroupRN = T2.GroupRN

Edit: this will also deal with the same value in a given set

edited Nov 09 '11 at 20:02

answered Nov 09 '11 at 19:09

gbn

422,506
82
585
676

1

Top notch, other than theres a missing comma :-) ( Is there a way to edit it, without doing 5 other small edits ) – Syska Nov 09 '11 at 19:21
This doesn't seem to work when there are partial matches between groups of the same size. If the @inputData is changed to insert into `@inputData values (42, 'dfce40b1-3719-4e4c-acfa-65f728677700'), (42, '89e7e6be-cee8-40a7-8135-a54659e0d88c'), (55, 'dfce40b1-3719-4e4c-acfa-65f728677700'), (55, '2395a42c-94f4-4cda-a773-221b26ea5e44')` Then the query will return an incorrect result – BrandonAGr Nov 09 '11 at 19:54
`groupData- (1, '8C2E8990-090E-4411-8B5A-BA9D2F428631'), (1, 'EE8720E7-F986-4CD4-AA0D-F2E72EB6C74B'), (2, '8C2E8990-090E-4411-8B5A-BA9D2F428631'), (2, '4139C8C0-26A3-428E-9387-3049856CE414') inputData- (42, '8C2E8990-090E-4411-8B5A-BA9D2F428631'), (42, 'EE8720E7-F986-4CD4-AA0D-F2E72EB6C74B'), (55, '4139C8C0-26A3-428E-9387-3049856CE414'), (55, '9C1F73BB-9857-4675-9167-A245F182104A')` – BrandonAGr Nov 09 '11 at 20:15
That returns extra rows when there is a partial match from groupData, I edited the query and was able to return only complete matches with `group by T1.tempGroupIdentifier, T2.groupIdentifier, T2.GroupCount having COUNT(T2.groupIdentifier) = T2.GroupCount` I can always generate the list of non matches from this result – BrandonAGr Nov 09 '11 at 20:17

score 1 · Accepted Answer · answered Nov 09 '11 at 19:24

1

   SELECT 
        (
        CASE WHEN matchCount = gdCount AND matchCount = idCount 
            THEN groupIdentifier 
            ELSE NULL 
        END) groupIdentifier, 
        cj.tempGroupIdentifier 
    FROM
    (
    SELECT gd.groupIdentifier, id.tempGroupIdentifier, COUNT(1) matchCount
    FROM @groupData gd 
    CROSS JOIN @inputData id
    WHERE id.elementIdentifier = gd.elementIdentifier 
    GROUP BY gd.groupIdentifier, id.tempGroupIdentifier) as cj
    CROSS APPLY (SELECT COUNT(groupIdentifier) from @groupData gdca WHERE gdca.groupIdentifier = cj.groupIdentifier) as gdc(gdCount)
    CROSS APPLY (SELECT COUNT(tempGroupIdentifier) from @inputData idca WHERE idca.tempGroupIdentifier = cj.tempGroupIdentifier) as idc(idCount)

answered Nov 09 '11 at 19:24

ImplexOne

549
3
7

wow. I understand what you're doing but not sure it'll scale with that CROSS JOIN. And FYI about the COUNT http://stackoverflow.com/questions/1221559/count-vs-count1/1221649#1221649 – gbn Nov 09 '11 at 19:26
Can be improved by pre-qualifying matches with count match. (Two sets are equal if neither contains items not present in other set -> item counts are the same). Have not looked at exec plan – ImplexOne Nov 09 '11 at 20:08
Thanks that looks good. One change I made is to have it only return matches, so removed the case to just return groupIdentifier and adding a `where matchCount = gdCount and matchCount = idCount`. As it was it was returning duplicates when the groupData set was smaller(only contained 1 row) and the input group was a partial match(had 2 rows 1 of which matched the set from groupData) – BrandonAGr Nov 09 '11 at 20:32
Cool. The original did have where clause, but I changed it to return non matches too as requested ("If there is no matching set then I need to know that too.") – ImplexOne Nov 09 '11 at 20:39

TSql equality on groups of rows

2 Answers2