SQL Count distinct values within the field

Question

I have this weird scenario (at least it is for me) where I have a table (actually a result set, but I want to make it simpler) that looks like the following:

ID  |  Actions
------------------
1   |  10,12,15
2   |  11,12,13
3   |  15  
4   |  14,15,16,17

And I want to count the different actions in the all the table. In this case, I want the result to be 8 (just counting 10, 11, ...., 17; and ignoring the repeated values).

In case it matters, I am using MS SQL 2008.

If it makes it any easier, the Actions were previously on XML that looks like

<root>
  <actions>10,12,15</actions>
</root>

I doubt it makes it easier, but somebody might comeback with an xml function that I am not aware and just makes everything easier.

Let me know if there's something else I should say.

yes, sorry, somehow I thought I had included 14.. I'll edit it right away. Thanks! — bebopayan, Sep 18 '13 at 00:44
@bebopayan, Do you know your range of numbers? E.g. 1 to 100? If so you could use a populated numbers table instead of splitting the string. — plalx, Sep 18 '13 at 00:50
http://stackoverflow.com/questions/1642996/t-sql-pivot-but-for-semicolon-separated-values-instead-of-columns shows how to unpivot the csv data into a column, and then you can just run a COUNT DISTINCT over the column — Preet Sangha, Sep 18 '13 at 00:50
@RBarryYoung, I would if I could, but I didn't design how the data was stored and now I have to work with it. — bebopayan, Sep 18 '13 at 00:52
@plalx, I get your point, although some times there could be 350 numbers from up to 2000. I like it, I'll try it later. Thanks for the tip! — bebopayan, Sep 18 '13 at 00:56
@PreetSangha, I'll take a look into that thread. Thanks for the reference. — bebopayan, Sep 18 '13 at 00:56
@bebopayan Then change it or tell the owners to change it. Seriously, do it now while you can perfectly demonstrate what's wrong with it and before this pathological data design self-perpetuates any further. In the long run it will save "them" far more than it will cost to just fix it now. It is pointless to invest time and money into trying to make this abject violation of First Normal Form work properly in a relational data environment. — RBarryYoung, Sep 18 '13 at 01:00
@bebopayan - I work a field where you can't change the input sources. Just because you can't change the input data, there should be nothing stopping you cleansing the data once in the DB (such as unpivoting or whatever else you want to do with it) and using the clean data as the base from which to build. — Preet Sangha, Sep 18 '13 at 01:10
@bebopayan see my edit answer. I did it directly from the XML. — Fabien TheSolution, Sep 18 '13 at 02:25
@RBarryYoung, God I've done it, but apparently they don't want to change anything. I've already explained how the current DB lacks pretty much everything it should have. No PKs, lots of xmls fields (even the xml is missing a decent schema), no FKs relationships, FKs hidden in the xml, the naming convention is also awful. I really understand your point and I agree with it, it's just that the project is quite advanced and they just don't want to. And since I'm the new guy I have to deal with it... — bebopayan, Sep 19 '13 at 00:44
@PreetSangha, It's not the input data.. it's the data stored for the app I'm working at... That said I usually do not understand what they were thinking. I thought about cleaning the data and storing it... but I am not sure if duplicating that much of information would be healthy. — bebopayan, Sep 19 '13 at 00:48

score 1 · Accepted Answer · answered Sep 18 '13 at 01:52

Using approach similar to http://codecorner.galanter.net/2012/04/25/t-sql-string-deaggregate-split-ungroup-in-sql-server/:

First you need a function that would split string, there're many examples on SO, here's one of them:

CREATE FUNCTION dbo.Split (@sep char(1), @s varchar(512))
RETURNS table
AS
RETURN (
    WITH Pieces(pn, start, stop) AS (
      SELECT 1, 1, CHARINDEX(@sep, @s)
      UNION ALL
      SELECT pn + 1, stop + 1, CHARINDEX(@sep, @s, stop + 1)
      FROM Pieces
      WHERE stop > 0
    )
    SELECT pn,
      SUBSTRING(@s, start, CASE WHEN stop > 0 THEN stop-start ELSE 512 END) AS s
    FROM Pieces
)

Using this you can run a simple query:

SELECT COUNT(DISTINCT S) FROM MyTable CROSS APPLY dbo.Split(',', Actions)

Here is the demo: http://sqlfiddle.com/#!3/5e706/3/0

Fabien TheSolution · Answer 2 · 2013-09-18T04:48:16.870

SQL Fiddle

MS SQL Server 2008 Schema Setup:

CREATE TABLE Table1
    ([ID] int, [Actions] varchar(11))
;

INSERT INTO Table1
    ([ID], [Actions])
VALUES
    (1, '10,12,15'),
    (2, '11,12,13'),
    (3, '15'),
    (4, '14,15,16,17')
;

Query 1:

DECLARE @S varchar(255)
DECLARE @X xml

SET @S = (SELECT Actions + ',' FROM Table1 FOR XML PATH(''))
SELECT @X = CONVERT(xml,'<root><s>' + REPLACE(@S,',','</s><s>') + '</s></root>')

SELECT count(distinct [Value])
FROM (
SELECT [Value] = T.c.value('.','varchar(20)')
FROM @X.nodes('/root/s') T(c)) AS Result
WHERE [Value] > 0

Results:

| COLUMN_0 |
|----------|
|        8 |

EDIT :

I think this is exactly what you are looking for :

SQL Fiddle

MS SQL Server 2008 Schema Setup:

Query 1:

DECLARE @X xml

SELECT @X = CONVERT(xml,replace('
<root>
  <actions>10,12,15</actions>
  <actions>11,12,13</actions>
  <actions>15</actions>
  <actions>14,15,16,17</actions>
</root>
',',','</actions><actions>'))

SELECT count(distinct [Value])
FROM (
SELECT [Value] = T.c.value('.','varchar(20)')
FROM @X.nodes('/root/actions') T(c)) AS Result

Results:

| COLUMN_0 |
|----------|
|        8 |

score 0 · Answer 3 · answered Sep 18 '13 at 02:23

A bit if a mess but here it is Create the function first and then call the lower code.

/* Helper Function */

CREATE FUNCTION dbo.Split (@sep char(1), @s varchar(8000))
RETURNS table
AS
RETURN (
    WITH splitter_cte AS (
      SELECT CHARINDEX(@sep, @s) as pos, 0 as lastPos
      UNION ALL
      SELECT CHARINDEX(@sep, @s, pos + 1), pos
      FROM splitter_cte
      WHERE pos > 0
     )
    SELECT SUBSTRING(@s, lastPos + 1,
                     case when pos = 0 then 80000
                     else pos - lastPos -1 end) as chunk
    FROM splitter_cte
  )
GO
---------------- End of Function 


/* Function Call */ 
Declare @Actions varchar(1000)
SELECT @Actions = STUFF((SELECT ',' + actions
          FROM tblActions
          ORDER BY actions
          FOR XML PATH('')), 1, 1, '')

SELECT  Distinct *
  FROM dbo.Split(',', @Actions)
OPTION(MAXRECURSION 0);

score 0 · Answer 4 · answered Sep 18 '13 at 02:44

0

If you have a table of Actions with one row per possible action id, you can do this with a join:

select count(distinct a.ActionId)
from t join
     Actions a
     on ','+t.Actions+',' like '%,'+cast(a.ActionId as varchar(255))+',%';

You could also create a table of numbers (using a CTE) if you know the actions are within some range.

answered Sep 18 '13 at 02:44

Gordon Linoff

1,242,037
58
646
786

This solution works but it can get really slow when talking about big data. Thanks for the answer tho! – bebopayan Sep 19 '13 at 00:27
@bebopayan . . . If you want efficiency, then you need an association table rather than storing ids in lists. – Gordon Linoff Sep 19 '13 at 01:46

SQL Count distinct values within the field

4 Answers4