COUNT(DISTINCT) in multiple columns in SQL Server 2008

Question

In Oracle, it's possible to get a count of distinct values in multiple columns by using the || operator (according to this forum post, anyway):

SELECT COUNT(DISTINCT ColumnA || ColumnB) FROM MyTable

Is there a way to do this in SQL Server 2008? I'm trying to perform a single query to return some group statistics, but I can't seem to do it.

For example, here is a table of values I'm trying to query:

AssetId MyId    TheirId   InStock
328     10      10        1
328     20      20        0
328     30      30        0
328     40      10        0
328     10      10        0
328     10      10        0
328     10      10        0
328     10      10        0

For AssetId #328, I want to compute the total number of unique IDs in the MyId and TheirId columns (4 = 10, 20, 30, 40), as well as the total number of non-zero rows in the InStock column (1):

AssetId     TotalIds    AvailableIds
328         4           1

Is there a way to work this magic somehow?

This question on SO may help: [counting distinct over multiple columns](http://stackoverflow.com/questions/1471250/counting-distinct-over-multiple-columns) — Alex P, Aug 12 '12 at 07:20

score 2 · Accepted Answer · edited Jan 18 '21 at 12:30

2

You can use a cross apply and values.

select T1.AssetId,
       count(distinct T2.ID) TotalIds,
       sum(case T2.InStock when 0 then 0 else 1 end) AvailableIds 
from YourTable as T1
  cross apply(values(T1.MyId, T1.InStock),
                    (T1.TheirId, 0)
             ) as T2(ID, InStock)
group by T1.AssetId

SE-Data

Or you can do a union all in a sub query.

select T.AssetId,
       count(distinct T.ID) TotalIds,
       sum(case T.InStock when 0 then 0 else 1 end) AvailableIds 
from (
     select AssetId, MyId as ID, InStock
     from YourTable
     union all
     select AssetID, TheirId, 0
     from YourTable
     ) as T
group by T.AssetId

edited Jan 18 '21 at 12:30

Community

1
1

answered Aug 12 '12 at 09:54

Mikael Eriksson

136,425
22
210
281

1

Wow. +1 on coolness but the cross apply could be a performance killer. – Lieven Keersmaekers Aug 12 '12 at 10:01
@Lieven It does not look so on the execution plan. It will only do one table scan and the cross apply does a constant scan. But the solutions provided should of course be tested with a realistic amount of data. It might be a different story with a lot of rows. – Mikael Eriksson Aug 12 '12 at 10:04
Sorry for the redonk delay in marking your answer as accepted, and thank you for the workable solution. I was hoping for an elegant native solution, but in the absence of that I'll settle for a working one... cheers! – Mass Dot Net May 02 '13 at 02:08

score 2 · Answer 2 · answered Aug 12 '12 at 14:50

You can follow the Oracle example and concatenate the values together (that is what the Oracle query is doing). You just have to convert the values to characters first:

select AssetId,
       count(distinct cast(MyId as varchar(8000))+','+cast(TheirId as varchar(8000)
            ) totalIds,
       count(distinct case when inStock> 0
                           then cast(MyId as varchar(8000))+','+cast(TheirId as varchar(8000)
             end) as AvailableIds
from t
group by Assetid

You can also do it as a subquery:

select AssetId, count(*) as TotalIds,
       sum(case when inStock > 0 then 1 else 0 end) as AvailableIds
from (select AssetId, myId, theirId, max(inStock) as inStock
      from t
      group by AssetId, myId, theirId
     ) a
group by AssetId

"Theoretically", I like the second approach better, since it is more set-based. However, if you find yourself trying to count distinct combinations columns in several different variables, the string concatenation approach is more practical.

Ivan Golović · Answer 3 · 2012-08-12T08:22:04.860

You can get the result like this:

DECLARE @t TABLE (AssetId INT, MyId    INT, TheirId   INT, InStock INT)
INSERT @t
VALUES 
(328,10, 10,   1)
,(328,20, 20,   0)
,(328,30, 30,   0)
,(328,40, 10,   0)
,(328,10, 10,   0)
,(328,10, 10,   0)
,(328,10, 10,   0)
,(328,10, 10,   0)

;WITH a AS(
    SELECT  AssetId,
            COUNT(col) cnt
    FROM 
    (
            SELECT  MyId col, AssetId
            FROM    @t
            UNION
            SELECT  TheirId col, AssetId
            FROM    @t
    ) b
    GROUP BY AssetId
)

SELECT  a.AssetId,
        a.cnt TotalIds,
        SUM(CASE WHEN InStock <> 0 THEN 1 ELSE 0 END) AvailableIds
FROM    @t c
JOIN    a ON a.AssetId = c.AssetId
GROUP   BY a.AssetId, a.cnt

In Common Table Expression (WITH code block) 'uniqueness' is guaranteed by using UNION operator which discards duplicate values, that's why COUNT(col) doesn't need to be used like COUNT(DISTINCT col).

I was hoping to avoid using CTEs, since it kills the performance of my query, increasing it by over 100% when processing large datasets. This is the solution I already have, but I was hoping there was some clever way of doing it, like with Oracle's "||" operator. — Mass Dot Net, Aug 12 '12 at 09:08

score 1 · Answer 4 · answered Aug 12 '12 at 08:07

1

I think it's good solution for you

SELECT COUNT(*)
FROM (SELECT DISTINCT Column1, Column2 
      FROM MyTable) A

answered Aug 12 '12 at 08:07

David Weinberg

1,033
1
13
29

This does not show groups, just a single count. Not answering the question – Geochem B Jul 18 '19 at 23:56

Pankaj · Answer 5 · 2012-08-13T06:02:42.593

0

Option #1

select 
          AssetID, count(distinct MyId) As MyId, SUM(InStock) InStock
From T
Group By 
      AssetID

Option #2 Without CTE

Select AssetID, count(MyId), sum(InStock) InStock From
(
    select 
            AssetID, MyId, SUM(InStock) InStock
    From MyTable
    Group By 
            AssetID, MyId
)K
Group by AssetID

Option #3 With CTE

;With Sub(AssetID, MyId, InStock)
As
(
    select 
            AssetID, MyId, SUM(InStock) InStock
    From MyTable
    Group By 
            AssetID, MyId
)

Select AssetID, count(MyId), sum(InStock) From
(
    Select * from Sub
)K

edited Aug 13 '12 at 06:02

answered Aug 12 '12 at 09:19

Pankaj

9,749
32
139
283

The column `theirID` is also required. – Lieven Keersmaekers Aug 12 '12 at 09:37
@Lieven - I think `TheirID` is not required Please check the sample output shared in the Query. – Pankaj Aug 12 '12 at 10:21
Per OP *I want to compute the total number of unique IDs in the MyId and TheirId columns* I'm pretty sure that `theirID` *is* required. – Lieven Keersmaekers Aug 12 '12 at 10:46

score 0 · Answer 6 · answered Aug 12 '12 at 09:48

If you don't like to use CTE's, you can try using following solution. The gist of it is to

select the TotalID's for each AssetID in a seperate subquery
select the AvailableIDs for each AssetID in a seperate subquery
JOIN the results of both subqueries to produce the final results.

The statement as is works on the entire table. You can get the results for a single AssetID by adding an appropriate where clause to the entire group.

SQL Statement

SELECT  a.AssetId, t.TotalIDs, a.AvailableIDs
FROM    (
            SELECT  AssetID, TotalIDs = COUNT(*) 
            FROM    (
                        SELECT  AssetID 
                        FROM    MyTable 
                        GROUP BY 
                                MyId, TheirID, AssetID
                    ) t 
            GROUP BY 
                    AssetID
        ) AS t
        INNER JOIN (
            SELECT  AssetID, AvailableIDs = SUM(InStock) 
            FROM    MyTable 
            GROUP BY 
                    AssetID
        ) AS a ON a.AssetId = t.AssetId

Test script

;WITH MyTable (AssetId, MyId, TheirId, InStock) AS (
    SELECT * FROM (VALUES 
        (328, 10, 10, 1)
        , (328, 20, 20, 0)
        , (328, 30, 30, 0)
        , (328, 40, 10, 0)
        , (328, 10, 10, 0)
        , (328, 10, 10, 0)
        , (328, 10, 10, 0)
        , (328, 10, 10, 0)
        , (329, 10, 10, 0)
        , (329, 10, 20, 1)
    ) AS a (b, c, d, e)
)
SELECT  a.AssetId, t.TotalIDs, a.AvailableIDs
FROM    (
            SELECT  AssetID, TotalIDs = COUNT(*) 
            FROM    (
                        SELECT  AssetID 
                        FROM    MyTable 
                        GROUP BY 
                                MyId, TheirID, AssetID
                    ) t 
            GROUP BY 
                    AssetID
        ) AS t
        INNER JOIN (
            SELECT  AssetID, AvailableIDs = SUM(InStock) 
            FROM    MyTable 
            GROUP BY 
                    AssetID
        ) AS a ON a.AssetId = t.AssetId

COUNT(DISTINCT) in multiple columns in SQL Server 2008

6 Answers6

Option #1

Option #2 Without CTE

Option #3 With CTE

Linked