2

I have a ms sql database table where I want to find all the unique combinations of categories (CatId) that subscribers (SubId) have signed up for.

SubId CatId Cat 
4     39    Google Play
4     40    Kobo
4     43    Other
5     39    Google Play
5     43    Other
7     49    Amazon
7     39    Google Play
7     40    Kobo
6     39    Google Play
6     40    Kobo
6     43    Other
8     49    Amazon
8     39    Google Play
8     40    Kobo
9     38    Barnes & Noble
9     41    Smashwords

The output would hopefully be like: (Where groupId is counter for the combinations)

GroupId CatId   Cat         Occurances
1        39     Google Play     2
1        40     Kobo            2
1        43     Other           2
2        39     Google Play     1
2        43     Other           1
3        49     Amazon          2
3        39     Google Play     2
3        40     Kobo            2
4        38     Barnes & Noble  1
4        41     Smashwords      1

Any help would be appreciated

Tom
  • 273
  • 4
  • 8
  • 1
    Where does the `GroupId` come from? –  Nov 27 '13 at 10:53
  • 1
    Its basically a counter for the combinations of what the subscribers have selected, without it theres no way of grouping the categories together in the combinations. – Tom Nov 27 '13 at 11:01

2 Answers2

2

The key to this is to first get a single row for each subID, and all their combinations by concatenating the rows into a single column using the SQL Server XML extensions:

SELECT  T.SubID,
        Combinations = STUFF((  SELECT  ',' + t2.Cat
                                FROM    T t2
                                WHERE   t.SubID = t2.SubID
                                ORDER BY t2.Cat
                                FOR XML PATH(''), TYPE
                            ).value('.', 'NVARCHAR(MAX)'), 1, 1, '')
FROM    T
GROUP BY T.SubID;

This gives:

SUBID   COMBINATIONS
------+-------------------------
4     | Google Play,Kobo,Other
5     | Google Play,Other
6     | Google Play,Kobo,Other
7     | Amazon,Google Play,Kobo
8     | Amazon,Google Play,Kobo
9     | Barnes & Noble,Smashwords

You could just perform a simple count on this result set:

WITH Combinations AS
(   SELECT  T.SubID,
            Combinations = STUFF((  SELECT  ',' + t2.Cat
                                    FROM    T t2
                                    WHERE   t.SubID = t2.SubID
                                    ORDER BY t2.Cat
                                    FOR XML PATH(''), TYPE
                                ).value('.', 'NVARCHAR(MAX)'), 1, 1, '')
    FROM    T
    GROUP BY T.SubID
)
SELECT  Combinations, Occurances = COUNT(*)
FROM    Combinations
GROUP BY Combinations;

Which would give:

COMBINATIONS              | OCCURANCES
--------------------------+------------
Amazon,Google Play,Kobo   |     2
Barnes & Noble,Smashwords |     1
Google Play,Kobo,Other    |     2
Google Play,Other         |     1

Or to get the output you have shown, you need to join this back to your main table, and use the Combinations column above to group on:

WITH Combinations AS
(   SELECT  T.SubID,
            Combinations = STUFF((  SELECT  ',' + t2.Cat
                                    FROM    T t2
                                    WHERE   t.SubID = t2.SubID
                                    ORDER BY t2.Cat
                                    FOR XML PATH(''), TYPE
                                ).value('.', 'NVARCHAR(MAX)'), 1, 1, '')
    FROM    T
    GROUP BY T.SubID
)
SELECT  GroupID = DENSE_RANK() OVER(ORDER BY c.Combinations),
        T.CatID,
        T.Cat,
        Occurances = COUNT(DISTINCT T.SubID)
FROM    T
        INNER JOIN Combinations c
            ON c.SubID = T.SubID
GROUP BY T.CatID, T.Cat, c.Combinations;

Example on SQL Fiddle

Community
  • 1
  • 1
GarethD
  • 68,045
  • 10
  • 83
  • 123
0

how about

select 
dense_rank() over (order by subid) as GroupID
,catID
,catName
,COUNT(*) as Occurances
 from table
group by subid, catID, catName
Kiril Rusev
  • 745
  • 3
  • 9