concatenate and de-dupe multiple rows

Question

I have some incoming rows in the below format.

| Col1 | Col2 | Col3 |
| 1    | A    | 1    |
| 1    | A    | 1,2  |
| 1    | A    | 1,3  |
| 1    | A    | 2,4  |

Desired outputsql is

| Col1 | Col2 | Col3    |
| 1    | A    | 1,2,3,4 |

Basically, group all rows based on Col1 and Col2 and then concatenate and remove duplicates from Col3.

SELECT COL1, COL2, {?????}
FROM TABLEA
GROUP BY COL1, COL2;

I could not think much at this moment. Any pointers would be much appreciated. I am inclined to WX2 database, but any ANSI compliant snippet would be helpful.

Don't store data as comma separated items. It will only cause you lots of trouble. — jarlh, Mar 21 '17 at 08:45
Agree, but can't help with the existing design. Also, this part is from dynamic SQL which makes life difficult — Srini V, Mar 21 '17 at 08:49
Idea: 1.Split your string in col3 to multiple rows, 2. SELECT DISTINCT value 3. CONCATENATE your value in col3 with group of col1, col2. Your hard work is to find an implementation to your WX2 database. — Pham X. Bach, Mar 21 '17 at 08:58
@realspirituals,I posted one answer.I hope It's give your expected result. — Mansoor, Mar 21 '17 at 09:19

score 0 · Answer 1 · edited Mar 21 '17 at 11:33

For SQL Server: first concatenate all col3 values using STUFF method and INSERT INTO CTE table.Based on this CTE tables split all rows as individual into single column based on CTE table.Finally concate all DISTINCT strings with help of STUFF.

 CREATE TABLE #table ( Col1 INT ,   Col2 VARCHAR(10) , Col3 VARCHAR(10))
 INSERT INTO #table ( Col1  ,   Col2  , Col3 )
 SELECT  1   , 'A'   , '1'  UNION ALL 
 SELECT  1   , 'A'   , '1,2' UNION ALL
 SELECT  1   , 'A'   , '1,3'  UNION ALL
 SELECT  1   , 'A'   , '2,4' 

 ;WITH CTEValues ( Colval ) AS
 (
   SELECT STUFF ( ( SELECT ',' +  Col3 FROM #table T2 WHERE T2.Col2 =  
                T1.col2 FOR XML PATH('') ),1,1,'') 
   FROM #table T1 
   GROUP BY Col2
 )

 SELECT * INTO #CTEValues
 FROM CTEValues

 ;WITH CTEDistinct ( SplitValues , SplitRemain ) AS
 (
  SELECT SUBSTRING(Colval,0,CHARINDEX(',',Colval)),    
         SUBSTRING(Colval,CHARINDEX(',',Colval)+1,LEN(Colval))
  FROM #CTEValues
  UNION ALL
  SELECT CASE WHEN CHARINDEX(',',SplitRemain) = 0 THEN SplitRemain ELSE  
                   SUBSTRING(SplitRemain,0,CHARINDEX(',',SplitRemain)) END, 
         CASE WHEN CHARINDEX(',',SplitRemain) = 0 THEN '' ELSE                               
        SUBSTRING(SplitRemain,CHARINDEX(',',SplitRemain)+1,LEN(SplitRemain))          
        END
  FROM CTEDistinct
  WHERE SplitRemain <> ''
 )

  SELECT STUFF ( ( SELECT DISTINCT ',' +  SplitValues FROM CTEDistinct T2  
  FOR XML PATH('') ),1,1,'')

@realspirituals,STUFF Is default built in string function.Which DBMS are you using. — Mansoor, Mar 21 '17 at 10:04

score 0 · Answer 2 · answered Mar 21 '17 at 09:27

You could try with transpose or concatenation functions. The difficulty comes from the fact that col3 is varchar and a conversion is needed to get the distinct values. With MySQL :

SELECT col1, col2, GROUP_CONCAT(DISTINCT col3) AS col3 FROM
(SELECT col1, col2, CONVERT(SUBSTR(col3, 1), UNSIGNED INTEGER) AS col3 FROM (
SELECT 1 AS col1, 'A' AS col2, '1' AS col3 UNION ALL
SELECT 1 AS col1, 'A' AS col2, '1,2' AS col3 UNION ALL
SELECT 1 AS col1, 'A' AS col2, '1,3' AS col3 UNION ALL  
SELECT 1 AS col1, 'A' AS col2, '2,4' AS col3
) AS t
UNION ALL
SELECT col1, col2, CONVERT(SUBSTR(col3, 3), UNSIGNED INTEGER) AS col3 FROM (
SELECT 1 AS col1, 'A' AS col2, '1' AS col3 UNION ALL
SELECT 1 AS col1, 'A' AS col2, '1,2' AS col3 UNION ALL
SELECT 1 AS col1, 'A' AS col2, '1,3' AS col3 UNION ALL  
SELECT 1 AS col1, 'A' AS col2, '2,4' AS col3
) AS t1
) AS t2
WHERE col3 <> 0

Result :

col1 | col2 | col3 
1    |   A  | 1,2,3,4

Col3 may have any value, as the numbers are integers (no defined length) before converted into csv fields — Srini V, Mar 21 '17 at 09:53
In this case, with dynamic sql, create a first query on your table to select the max length of col 3 (with commas removed) in a variable. Say result is L. Then loop L times to generate the successive blocks of a similar query, simply adjusting for each block the index of the substr function. The WHERE clause here is needed to remove the 0s you will obtain for row having len(col3 without commas) < L. — mattt, Mar 21 '17 at 10:09

score 0 · Accepted Answer · answered Mar 21 '17 at 11:12

For Postgres use this:

select col1, col2, string_agg(distinct col3, ',') as col3
from (
   select col1, col2, x.col3
   from tablea, unnest(string_to_array(col3, ',')) as x(col3)
) t
group by col1, col2;

This is largely ANSI compliant except for the string_to_array() and string_agg() function.

concatenate and de-dupe multiple rows

3 Answers3