2

The next table is a simplification of my problem in SQL Server:

ID  COLUMN_A    COLUMN_B    COLUMN_C
-------------------------------------
1    A            B           C
1    A            B           D
1    B            C           D

I want to get a group with the columns concatenated by comma without repeated values. I tried to use STRING_AGG() but it returns:

ID  COLUMN_A    COLUMN_B    COLUMN_C
-------------------------------------
1    A, A, B    B, B, C     C, D, D

This is the query I have done:

SELECT ID, STRING_AGG(COLUMN_A, ', ') AS COL_A, STRING_AGG(COLUMN_B, ', ') AS COL_B,
 STRING_AGG(COLUMN_C, ', ') AS COL_C   
FROM MYTABLE
GROUP BY ID;

I would like the next result:

ID  COLUMN_A    COLUMN_B    COLUMN_C
-------------------------------------
1    A, B        B, C        C, D

Thank you!

3 Answers3

1

Without using window functions. The union might slow things down, but give it a try and see if you can tolerate the performance.

 with 
 cte1 (id, col, indicator) as 
 
  (select id, column_a, 'col1' from t union
   select id, column_b, 'col2' from t union
   select id, column_c, 'col3' from t),
  
 cte2 (id, indicator, agg) as
  
  (select id, indicator, string_agg(col,',') 
   from cte1
   group by id, indicator)
  
 select id,
        max(case when indicator='col1' then agg end) as column_a,
        max(case when indicator='col2' then agg end) as column_b,
        max(case when indicator='col3' then agg end) as column_c
 from cte2
 group by id;
    
Radagast
  • 5,102
  • 3
  • 12
  • 27
  • I dont understand how it works exactly but works perfectly. It would be very helpful if you could explain a little bit the functionality. Thank you very much! – alejoaldana Jul 07 '20 at 17:32
  • @alejoaldana We start by stacking the dataset in a way that allows us to dedupe it and also indicate what column each stack indicates. Then you simply aggregate them. The case statement is there so you can have three columns conditional on the indicator. I have broken this code into pieces in this demo so you can understand how it works https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=6d475677a7e233cc57bfb9bf3a8a8c9f – Radagast Jul 07 '20 at 18:10
  • Thank you very much! Really helpful! – alejoaldana Jul 07 '20 at 19:04
0

Unfortunately, string_agg(distinct) doesn't work (yet). But you can do something a little bit more complicated:

SELECT ID,
      STRING_AGG(CASE WHEN seqnum_a = 1 THEN COLUMN_A, END ', ') AS COLUMN_A,
      STRING_AGG(CASE WHEN seqnum_b = 1 THEN COLUMN_B, END ', ') AS COLUMN_B,
      STRING_AGG(CASE WHEN seqnum_c = 1 THEN COLUMN_C, END ', ') AS COLUMN_C
FROM (SELECT t.*,
             ROW_NUMBER() OVER (PARTITION BY ID, COLUMN_A ORDER BY ID) as seqnum_a,
             ROW_NUMBER() OVER (PARTITION BY ID, COLUMN_B ORDER BY ID) as seqnum_b,
             ROW_NUMBER() OVER (PARTITION BY ID, COLUMN_C ORDER BY ID) as seqnum_c
      FROM MYTABLE t
     ) t
GROUP BY ID;

So, although STRING_AGG() doesn't remove duplicates, it does ignore NULL values.

paone
  • 828
  • 8
  • 18
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thank you for your quick answer. It is throwing the next error, **The function ROW_NUMBER must have an OVER clause with ORDER BY.** – alejoaldana Jul 07 '20 at 16:52
0

Here is XML and XQuery based solution.

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT, COLUMN_A CHAR(1), COLUMN_B CHAR(1), COLUMN_C CHAR(1));
INSERT INTO @tbl (ID, COLUMN_A, COLUMN_B, COLUMN_C)
VALUES
(1,'A','B','C'),
(1,'A','B','D'),
(1,'B','C','D');
-- DDL and sample data population, end

DECLARE @separator CHAR(1) = ',';

;WITH rs AS
(
    SELECT ID
        , CAST('<root><r><![CDATA[' + 
            REPLACE(STRING_AGG(COLUMN_A, ','), @separator, ']]></r><r><![CDATA[') +
            ']]></r></root>' AS XML) AS COL_A
        , CAST('<root><r><![CDATA[' + 
            REPLACE(STRING_AGG(COLUMN_B, ','), @separator, ']]></r><r><![CDATA[') +
            ']]></r></root>' AS XML) AS COL_B
        , CAST('<root><r><![CDATA[' + 
            REPLACE(STRING_AGG(COLUMN_c, ','), @separator, ']]></r><r><![CDATA[') +
            ']]></r></root>' AS XML) AS COL_C

    FROM @tbl
    GROUP BY ID
)
SELECT rs.ID
    , COL_A.query('for $i in distinct-values(/root/r/text())
       return if ($i eq (distinct-values(/root/r/text())[last()])[1]) then $i
             else concat($i, sql:variable("@separator"))
    ').value('.', 'NVARCHAR(MAX)') AS COL_A
    , COL_B.query('for $i in distinct-values(/root/r/text())
       return if ($i eq (distinct-values(/root/r/text())[last()])[1]) then $i
             else concat($i, sql:variable("@separator"))
    ').value('.', 'NVARCHAR(MAX)') AS COL_B
    , COL_C.query('for $i in distinct-values(/root/r/text())
       return if ($i eq (distinct-values(/root/r/text())[last()])[1]) then $i
             else concat($i, sql:variable("@separator"))
    ').value('.', 'NVARCHAR(MAX)') AS COL_C
FROM rs;

Output

+----+-------+-------+-------+
| ID | COL_A | COL_B | COL_C |
+----+-------+-------+-------+
|  1 | A, B  | B, C  | C, D  |
+----+-------+-------+-------+
Yitzhak Khabinsky
  • 18,471
  • 2
  • 15
  • 21