grouping rows by csv column TSQL

Question

I have a table with following sample data:

**CategoriesIDs         RegistrantID**
47                              1
276|275|278|274|277             4
276|275|278|274|277           16261
NULL                      16262 
NULL                      16264
NULL                      16265
NULL                      16266
NULL                      16267
NULL                          16268
NULL                     16269
NULL                     16270   
276|275|278                  16276
276|275|278|274|277          16292
276|275|278|274|277          16293
276|275|278|274|277          16294
276|275|278|274|277          16295 
276|275|278|274|277          16302
276|275|278|274|277          16303
276|275|278|274|277          16304
276|275|278|274|277          16305
276|275|278|274|277          16306
276|275|278|274|277          16307

I need to know :

1). which category has how many regisrtantids (like 277 has how many registrantids) 2). group the registrants by category so that i can find which registrants are in category 277 for example)

Do I need to create a function which generates a table from csv ? I have created a function but not sure if it will work in this situation with IN clause.

Please suggest

Consider to normalize your data. It will save you lots of pain when it comes to query your data. — peterm, Aug 24 '13 at 07:36
@peterm I understand that but that may not be possible right now. — DotnetSparrow, Aug 24 '13 at 07:41
Using a table function and joining ist should give you the base for the needed evaluations. There are many functions like this on SO like e.g. [F_SplitAsIntTable](http://stackoverflow.com/a/16993267/1699210) — bummi, Aug 24 '13 at 07:53

score 0 · Answer 1 · answered Aug 24 '13 at 07:58

If you are looking for output below

Category    Reg Count
277         12
274         12
47          1
276         13
278         13
275         13

SQL FIDDLE DEMO

Try this

SELECT Category,COUNT([RegistrantID]) AS [Reg Count] FROM
(
    SELECT 
        Split.a.value('.', 'VARCHAR(100)') AS Category
        ,[RegistrantID]
    FROM 
    (
        SELECT 
        CONVERT(XML,'<C>'+REPLACE([CategoriesIDs],'|','</C><C>') + '</C>') AS Categories
        , [RegistrantID] 
        FROM table1
    ) T CROSS APPLY  Categories.nodes('/C') AS Split(a)
) T1
GROUP BY Category

score 0 · Answer 2 · answered Aug 24 '13 at 09:31

You should normalise your data.

That said, try this.

;with c as (
    select RegistrantID, CategoriesIDs, 0 as start, CHARINDEX('|', CategoriesIDs) as sep 
    from yourtable    
    union all
    select RegistrantID,CategoriesIDs, sep, CHARINDEX('|', CategoriesIDs, sep+1) from c
    where sep>0

)
    select *, count(*) over (partition by CategoriesID)
    from
    (   
        select convert(int,SUBSTRING(CategoriesIDs,start+1,chars)) as [CategoriesID],
               RegistrantID
        from
        (
            select *, 
                                 Case sep when 0 then LEN(CategoriesIDs) else sep-start-1 end as chars
            from c
        ) v
    ) c2
    order by CategoriesID

score 0 · Answer 3 · answered Aug 24 '13 at 12:07

If you have a "Categories" table, you can do this with the following query:

select c.CategoryId, count(*)
from t join
     categories c
     on ','+cast(c.CategoryId as varchar(255))+',' like '%,'+CategoriesId+',%'
group by c.CategoryId;

This will not be particularly efficient. But neither will breaking the string apart. You should really have an association table with one row per item (in your original table) and category.

grouping rows by csv column TSQL

3 Answers3