T-SQL - Get a list of all As which have the same set of Bs

Question

I'm struggling with a tricky SQL query that I'm trying to write. Have a look at the following table:

+---+---+
| A | B |
+---+---+
| 1 | 2 |
| 1 | 3 |
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
| 3 | 2 |
| 3 | 3 |
| 4 | 2 |
| 4 | 3 |
| 4 | 4 |
+---+---+

Now, from this table, I essentially want a list of all As which have the exact same set of Bs and give each set an incrementing ID.

Hence, the output set for the above would be:

+---+----+
| A | ID |
+---+----+
| 1 |  1 |
| 3 |  1 |
| 2 |  2 |
| 4 |  2 |
+---+----+

Thanks.

Edit: If it helps, I have a list of all distinct values of B that are possible in another table.

Edit: Thank you so much for all the innovative answers. Was able to learn a lot indeed.

I have thought about taking a distinct list of As, then storing the CSV list of Bs for every A within a temp table. Finally I would find the entries in the table which have the same CSV strings. — Karthik Iyengar, Jun 06 '15 at 07:48
There is a commn trick to build CSV string using xml function, don't need a temp table: http://sqlandme.com/2011/04/27/tsql-concatenate-rows-using-for-xml-path/ — edc65, Jun 06 '15 at 07:50
Yes, I am aware of that, but I was hoping for a more elegant solution. Nevertheless, your question about the additional rows is thought provoking. I may have to change the desired output a bit. — Karthik Iyengar, Jun 06 '15 at 07:52

Giorgi Nakeuri · Accepted Answer · 2015-06-06T14:30:18.673

5

Here is mathematical trick to solve your tricky select:

with pow as(select *, b * power(10, row_number() 
              over(partition by a order by b)) as rn from t)
select a, dense_rank() over( order by sum(rn)) as rn 
from pow
group by a
order by rn, a

Fiddle http://sqlfiddle.com/#!3/6b98d/11

This of course will work only for limited distinct count as you will get overflow. Here is more general solution with strings:

select a, 
dense_rank() over(order by (select  '.' + cast(b as varchar(max))
                            from t t2 where t1.a = t2.a
                            order by b
                            for xml path(''))) rn
from t t1
group by a
order by rn, a

Fiddle http://sqlfiddle.com/#!3/6b98d/29

edited Jun 06 '15 at 14:30

answered Jun 06 '15 at 09:04

Giorgi Nakeuri

35,155
8
47
75

3

Although I upvoted, you should change two things. First, you really need a separator for the `b` values in the string solution (how many elements are in "123"). Second, you should never use `varchar()` without a length parameter in SQL Server. The default length varies by context; being explicit prevents further errors. – Gordon Linoff Jun 06 '15 at 13:17
@GordonLinoff, you are absolutely right. 123 and 12 and 3 would give same rank. Good catch! – Giorgi Nakeuri Jun 06 '15 at 14:31

edc65 · Answer 2 · 2015-06-06T13:25:16.277

3

Something like this:

select a, dense_rank() over (order by g) as id_b
from (
  select a,
  (select b from MyTable s where s.a=a.a order by b FOR XML PATH('')) g
  from MyTable a
  group by a
) a
order by id_b,a

Or maybe using a CTE (I avoid them when possible)

Sql Fiddle

As a side note, this is the output of the inner query using the sample data in the question:

a   g
1   <b>2</b><b>3</b>
2   <b>2</b><b>3</b><b>4</b>
3   <b>2</b><b>3</b>
4   <b>2</b><b>3</b><b>4</b>

edited Jun 06 '15 at 13:25

answered Jun 06 '15 at 09:11

edc65

462
7
19

Row number will not give you the right result set, if I am not mistaken. – Karthik Iyengar Jun 06 '15 at 09:33
You were right, sorry. I think it can be done with row_number, but dense_rank is better fit. – edc65 Jun 06 '15 at 10:26
See my comment @Giorgi's equivalent solution. – Gordon Linoff Jun 06 '15 at 13:17
@Gordon There **is** a separator in my solution. It is the default xlm open/close tag. If it is the right separator, it depends upon the range and type of b values. – edc65 Jun 06 '15 at 13:21

score 2 · Answer 3 · edited May 23 '17 at 12:09

2

EDIT I am changing the code, but it will get bigger now, took help from Concatenate many rows into a single text string? for concatinating strings

Select [A],
   Left(M.[C],Len(M.[C])-1) As [D] into #tempSomeTable
From
(
    Select distinct T2.[A], 
        (
            Select Cast(T1.[B] as VARCHAR) + ',' AS [text()]
            From sometable T1
            Where T1.[A] = T2.[A]
            ORDER BY T1.[A]
            For XML PATH ('')
        ) [C]
    From sometable T2
 )M

  SELECT t.A, DENSE_RANK() OVER(ORDER BY t.[D]) [ID] FROM
  #tempSomeTable t
   inner join
  (SELECT [D] FROM(
  SELECT [D], COUNT([A]) [D_A] from 
   #tempSomeTable t
  GROUP BY [D] )P where [C_A]>1)t1 on t1.[D]=t.[D]

edited May 23 '17 at 12:09

Community

1
1

answered Jun 06 '15 at 08:29

debatanu

766
5
11

Sum? try with `2,4,6` and `1,3,8` – edc65 Jun 06 '15 at 08:31
Thanks, changed it to string comparison. – debatanu Jun 06 '15 at 09:07
Added the concatinating string logic, now it should be giving the required output – debatanu Jun 06 '15 at 10:05
Step by step you are arriving to my answer :) . No need of a temp table. – edc65 Jun 06 '15 at 10:17
... and mine is approaching yours too. – edc65 Jun 06 '15 at 10:27
Your solution looks much simpler. But you still need to consider the case where there are other mismatching patterns, such as (5,2),(5,4) in the above example. That needs to be excluded. – debatanu Jun 06 '15 at 10:50
Tried adding rows (5,2) and (5,4), there is a new id_b == 3, no problem. Can you explain what worries you? – edc65 Jun 06 '15 at 13:03
I guess, what I wanted to say is that in the cases of (5,2) and (5,4) should there be an output with id_b==3? Because then we would get an A which doesn't have a set in the table. – debatanu Jun 06 '15 at 13:43

StuartLC · Answer 4 · 2015-06-06T09:14:26.117

Here's a long winded approach, by finding sets with the same elements (using EXCEPT bidirectionally to eliminate, and just done a half diagonal cartesian product), then pairing equal sets up, stamping each pair with a ROW_NUMBER(), before unpivoting the pairs of A's into to your final output where the equivalent sets are projected as rows which have the same id.

WITH joinedSets AS
(
    SELECT t1.A as t1A, t2.A AS t2A
    FROM MyTable t1
    INNER JOIN MyTable t2
    ON t1.B = t2.B
        AND t1.A < t2.A
),
equalSets AS
(
    SELECT js.t1A, js.t2A, ROW_NUMBER() OVER (ORDER BY js.t1A) AS Id
    FROM joinedSets js
    GROUP BY js.t1A, js.t2A
    HAVING NOT EXISTS ((SELECT mt.B FROM MyTable mt WHERE mt.A = js.t1A)
           EXCEPT (SELECT mt.B FROM MyTable mt WHERE mt.A = js.t2A))
         AND NOT EXISTS ((SELECT mt.B FROM MyTable mt WHERE mt.A = js.t2A) 
           EXCEPT (SELECT mt.B FROM MyTable mt WHERE mt.A = js.t1A))
)
SELECT A, Id
FROM equalSets
UNPIVOT
(
    A
    FOR ACol in (t1A, t2A)
) unp;

SqlFiddle here

As it stands, this solution will only work with pairs of sets, not triples etc. A general NTuple type solution is probably possible (but beyond my brain right now).

score 2 · Answer 5 · answered Jun 06 '15 at 10:00

Here is a very simple, fast, but approximate solution. It is possible that CHECKSUM_AGG returns the same checksum for different sets of B.

DECLARE @T TABLE (A int, B int);

INSERT INTO @T VALUES
(1, 2),(1, 3),(2, 2),(2, 3),(2, 4),(3, 2),(3, 3),(4, 2),(4, 3),(4, 4);

SELECT
    A
    ,CHECKSUM_AGG(B) AS CheckSumB
    ,ROW_NUMBER() OVER (PARTITION BY CHECKSUM_AGG(B) ORDER BY A) AS GroupNumber
FROM @T
GROUP BY A
ORDER BY A, GroupNumber;

Result set

A    CheckSumB    GroupNumber
-----------------------------
1    1            1
2    5            1
3    1            2
4    5            2

For exact solution group by A and concatenate all B values into a long (binary) string using either FOR XML, CLR, or T-SQL function. Then you can partition ROW_NUMBER by that concatenated string to assign numbers to the groups. As shown in other answers.

score 0 · Answer 6 · answered Jun 27 '15 at 21:25

Here is an exact, rather than approximate, solution. It uses nothing more advanced than INNER JOIN and GROUP BY (and, of course, the DENSE_RANK() to get the ID you want).

It is also general, in that it allows for B values to be repeated within an A group.

SELECT   A,
         DENSE_RANK() OVER (ORDER BY MIN_EQUIVALENT_A) AS ID

FROM     (
          SELECT   MATCHES.A1 AS A,
                   MIN(MATCHES.A2) AS MIN_EQUIVALENT_A

          FROM     (
                    SELECT   T1.A AS A1,
                             T2.A AS A2,
                             COUNT(*) AS NUM_B_VALS_MATCHED

                    FROM     (
                              SELECT   A,
                                       B,
                                       COUNT(*) AS B_VAL_FREQ
                              FROM     MyTable
                              GROUP BY A,
                                       B
                             ) AS T1

                             INNER JOIN

                             (
                              SELECT   A,
                                       B,
                                       COUNT(*) AS B_VAL_FREQ
                              FROM     MyTable
                              GROUP BY A,
                                       B
                             ) AS T2

                             ON T1.B = T2.B
                                AND T1.B_VAL_FREQ = T2.B_VAL_FREQ

                    GROUP BY T1.A,
                             T2.A
                   ) AS MATCHES

                   INNER JOIN

                   (
                    SELECT   A,
                             COUNT(DISTINCT B) AS NUM_B_VALS_TOTAL
                    FROM     MyTable
                    GROUP BY A
                   ) AS CHECK_TOTALS_A1

                   ON MATCHES.A1 = CHECK_TOTALS_A1.A
                      AND MATCHES.NUM_B_VALS_MATCHED
                          = CHECK_TOTALS_A1.NUM_B_VALS_TOTAL

                   INNER JOIN

                   (
                    SELECT   A,
                             COUNT(DISTINCT B) AS NUM_B_VALS_TOTAL
                    FROM     MyTable
                    GROUP BY A
                   ) AS CHECK_TOTALS_A2

                   ON MATCHES.A2 = CHECK_TOTALS_A2.A
                      AND MATCHES.NUM_B_VALS_MATCHED
                          = CHECK_TOTALS_A2.NUM_B_VALS_TOTAL

          GROUP BY MATCHES.A1
         ) AS EQUIVALENCE_TABLE

ORDER BY 2,1
;

T-SQL - Get a list of all As which have the same set of Bs

6 Answers6

Linked