63

I'm using the STRING_AGG function in SQL Server 2017. I'd like to create the same effect as COUNT(DISTINCT <column>). I tried STRING_AGG(DISTINCT <column>,',') but that is not legal syntax.

I'd like to know if there is a T-SQL work-around. Here is my sample:

WITH Sitings 
  AS
  (
    SELECT * FROM (VALUES 
      (1, 'Florida', 'Orlando', 'bird'),
      (2, 'Florida', 'Orlando', 'dog'),
      (3, 'Arizona', 'Phoenix', 'bird'),
      (4, 'Arizona', 'Phoenix', 'dog'),
      (5, 'Arizona', 'Phoenix', 'bird'),
      (6, 'Arizona', 'Phoenix', 'bird'),
      (7, 'Arizona', 'Phoenix', 'bird'),
      (8, 'Arizona', 'Flagstaff', 'dog')
    ) F (ID, State, City, Siting)
  ) 
SELECT State, City, COUNT(DISTINCT Siting) [# Of Types], STRING_AGG(Siting,',') Animals
FROM Sitings 
GROUP BY State, City

The above produces the following result:

+---------+-----------+--------------+-------------------------+
|  State  |   City    | # Of Types   |         Animals         |
+---------+-----------+--------------+-------------------------+
| Arizona | Flagstaff |            1 | dog                     |
| Florida | Orlando   |            2 | dog,bird                |
| Arizona | Phoenix   |            2 | bird,bird,bird,dog,bird |
+---------+-----------+--------------+-------------------------+

The output is exactly what I want, except I want the concatenated "Animals" listed for Phoenix Arizona to be DISTINCT, like this:

+---------+-----------+--------------+--------------------+
|  State  |   City    | # Of Types   |      Animals       |
+---------+-----------+--------------+--------------------+
| Arizona | Flagstaff |            1 | dog                |
| Florida | Orlando   |            2 | dog,bird           |
| Arizona | Phoenix   |            2 | bird,dog           |
+---------+-----------+--------------+--------------------+

Any ideas?

When I use my real data set, which is much larger, I get an error about the "Animals" column exceeding 8000 characters.

My question I think is the same as this one, except my example is much simpler.

Vladimir Baranov
  • 31,799
  • 5
  • 53
  • 90
Brent Arias
  • 29,277
  • 40
  • 133
  • 234

4 Answers4

38

Here is one way to do it.

Since you want the distinct counts as well, it can be done simply by grouping the rows twice. The first GROUP BY will remove duplicates, the second GROUP BY will produce the final result.

WITH
Sitings
AS
(
    SELECT * FROM (VALUES 
    (1, 'Florida', 'Orlando', 'bird'),
    (2, 'Florida', 'Orlando', 'dog'),
    (3, 'Arizona', 'Phoenix', 'bird'),
    (4, 'Arizona', 'Phoenix', 'dog'),
    (5, 'Arizona', 'Phoenix', 'bird'),
    (6, 'Arizona', 'Phoenix', 'bird'),
    (7, 'Arizona', 'Phoenix', 'bird'),
    (8, 'Arizona', 'Flagstaff', 'dog')
    ) F (ID, State, City, Siting)
)
,CTE_Animals
AS
(
    SELECT
        State, City, Siting
    FROM Sitings
    GROUP BY State, City, Siting
)
SELECT
    State, City, COUNT(1) AS [# Of Sitings], STRING_AGG(Siting,',') AS Animals
FROM CTE_Animals
GROUP BY State, City
ORDER BY
    State
    ,City
;

Result

+---------+-----------+--------------+----------+
|  State  |   City    | # Of Sitings | Animals  |
+---------+-----------+--------------+----------+
| Arizona | Flagstaff |            1 | dog      |
| Arizona | Phoenix   |            2 | bird,dog |
| Florida | Orlando   |            2 | bird,dog |
+---------+-----------+--------------+----------+

If you are still getting an error message about exceeding 8000 characters, then cast the values to varchar(max) before STRING_AGG.

Something like

STRING_AGG(CAST(Siting AS varchar(max)),',') AS Animals
Vladimir Baranov
  • 31,799
  • 5
  • 53
  • 90
  • 6
    BTW: the explicit double GROUP BY is no performance drawback, since a DISTINCT in an aggregate function will do an implicit grouping (or sort distinct) too. When you compare the execution plans of your original query with those in this solution, you would find, that the double grouping is much faster (I tested it on a multi-milion row table and found a factor of ~3 in CPU time and factor ~12 in total execution time) – Thomas Franz Aug 10 '18 at 08:22
15

Here is one more way of doing it (sql fiddle):

  WITH Sitings 
  AS
  (
    SELECT * FROM (VALUES 
      (1, 'Florida', 'Orlando', 'bird'),
      (2, 'Florida', 'Orlando', 'dog'),
      (3, 'Arizona', 'Phoenix', 'bird'),
      (4, 'Arizona', 'Phoenix', 'dog'),
      (5, 'Arizona', 'Phoenix', 'bird'),
      (6, 'Arizona', 'Phoenix', 'bird'),
      (7, 'Arizona', 'Phoenix', 'bird'),
      (8, 'Arizona', 'Flagstaff', 'dog')
    ) F (ID, State, City, Siting)
  ) 

select State,City,count(*) as [# Of Sitings],(select string_agg(value,', ') from (select distinct value from string_split(string_agg(Siting, ','),',')) t) AS Animals
FROM Sitings 
GROUP BY State, City

You may easily convert the splitting and merging part into a reusable scalar valued function.

NOTE

This is NOT an optimal solution, if you group first and then do aggregate (like answers above) it is better. Also, it does not get # of Types, it gets # of Sitings instead. However, it becomes handy as a quick inline function.

Ravi M Patel
  • 2,905
  • 2
  • 23
  • 32
  • I didn't understand how it works, but it works for me!) – dzhukov Feb 16 '22 at 04:37
  • @dzhukov, it groups by state and city first to get output like `bird, bird, dog` etc. Then it splits them again, gets distinct values and then joins them to get `bird, dog`. – Ravi M Patel Feb 17 '22 at 04:39
  • 1
    I love this for the utility of being able to do it in the select clause in one line! Thanks! To combine this with a character limit and concatenation - I use ,BigList = (SELECT STRING_AGG( CONVERT(varchar(max),VALUE),', ') FROM (SELECT DISTINCT VALUE FROM STRING_SPLIT(STRING_AGG(CONVERT(varchar(max),CONCAT(tbla.[ID],'-',[tbla.Name])),','),',')) t) – Jon vB Dec 01 '22 at 21:25
13

just use sub-query

WITH Sitings 
      AS
      (
        SELECT * FROM (VALUES 
          (1, 'Florida', 'Orlando', 'bird'),
          (2, 'Florida', 'Orlando', 'dog'),
          (3, 'Arizona', 'Phoenix', 'bird'),
          (4, 'Arizona', 'Phoenix', 'dog'),
          (5, 'Arizona', 'Phoenix', 'bird'),
          (6, 'Arizona', 'Phoenix', 'bird'),
          (7, 'Arizona', 'Phoenix', 'bird'),
          (8, 'Arizona', 'Flagstaff', 'dog')
        ) F (ID, State, City, Siting)
      ) 

    select State,City,count(*) as [# Of Types],STRING_AGG(Siting,',') AS Animals from 
    (
      SELECT State, City, Siting
    FROM Sitings 
    GROUP BY State, City,Siting
    ) as T  group by State,City

http://sqlfiddle.com/#!18/ba4b8/11

  State     City    # Of Types  Animals
Arizona     Flagstaff   1   dog
Florida     Orlando     2   bird,dog
Arizona     Phoenix     2   bird,dog
Zaynul Abadin Tuhin
  • 31,407
  • 5
  • 33
  • 63
7

You can use this in postgres. I am not sure about mysql. But this works in postgres.

select state, city, string_agg(distinct (siting), ', ') from sitings group by state, city;

This will aggregate only distinct values.

arti gupta
  • 109
  • 1
  • 5