Get unique values using STRING_AGG in SQL Server

Question

The following query returns the results shown below:

SELECT 
    ProjectID, newID.value
FROM 
    [dbo].[Data] WITH(NOLOCK)  
CROSS APPLY 
    STRING_SPLIT([bID],';') AS newID  
WHERE 
    newID.value IN ('O95833', 'Q96NY7-2')

Results:

ProjectID   value
---------------------
2           Q96NY7-2
2           O95833
2           O95833
2           Q96NY7-2
2           O95833
2           Q96NY7-2
4           Q96NY7-2
4           Q96NY7-2

Using the newly added STRING_AGG function (in SQL Server 2017) as it is shown in the following query I am able to get the result-set below.

SELECT 
    ProjectID,
    STRING_AGG( newID.value, ',') WITHIN GROUP (ORDER BY newID.value) AS 
NewField
FROM
    [dbo].[Data] WITH(NOLOCK)  
CROSS APPLY 
    STRING_SPLIT([bID],';') AS newID  
WHERE 
    newID.value IN ('O95833', 'Q96NY7-2')  
GROUP BY 
    ProjectID
ORDER BY 
    ProjectID

Results:

ProjectID   NewField
-------------------------------------------------------------
2           O95833,O95833,O95833,Q96NY7-2,Q96NY7-2,Q96NY7-2
4           Q96NY7-2,Q96NY7-2

I would like my final output to have only unique elements as below:

ProjectID   NewField
-------------------------------
2           O95833, Q96NY7-2
4           Q96NY7-2

Any suggestions about how to get this result? Please feel free to refine/redesign from scratch my query if needed.

So you have data stored as delimited values and now you want to split them, find distinct values and finally cram them all back into a delimited string? YUCK!!! Delimited data violates 1NF. That is why you are struggling so much here. You will have to use STUFF and FOR XML with DISTINCT thrown in to do this after you first split it. — Sean Lange, May 29 '18 at 16:38
Any simple example on how to use the STUFF and FOR XML with DISTINCT in my dataset? I can't avoid STRING_SPLIT as unfortunately the raw data is stored as delimited values as you realised. — gkoul, May 29 '18 at 16:44
And be careful with that NOLOCK hint. https://blogs.sentryone.com/aaronbertrand/bad-habits-nolock-everywhere/ — Sean Lange, May 29 '18 at 16:45

score 53 · Accepted Answer · answered May 29 '18 at 16:43

53

Use the DISTINCT keyword in a subquery to remove duplicates before combining the results: SQL Fiddle

SELECT 
ProjectID
,STRING_AGG(value, ',') WITHIN GROUP (ORDER BY value) AS 
NewField
from (
    select distinct ProjectId, newId.value 
    FROM [dbo].[Data] WITH(NOLOCK)  
    CROSS APPLY STRING_SPLIT([bID],';') AS newID  
    WHERE newID.value IN (   'O95833' , 'Q96NY7-2'  )  
) x
GROUP BY ProjectID
ORDER BY ProjectID

answered May 29 '18 at 16:43

JohnLBevan

22,735
13
96
178

2

ps. a couple of comments on the documentation have asked for `distinct` keyword support for this function; so maybe MS will consider adding such functionality in a future version: https://learn.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql?view=sql-server-2017 – JohnLBevan May 29 '18 at 16:48
9

Please give you vote here for DISTINCT: https://feedback.azure.com/forums/908035-sql-server/suggestions/35243533-support-distinct-for-string-agg – ValGe Dec 05 '18 at 07:55
6

Updated link to vote for DISTINCT: https://feedback.azure.com/d365community/idea/0e8fa860-7c25-ec11-b6e6-000d3a4f0da0 – Mike Schall Dec 06 '21 at 19:40

score 8 · Answer 2 · answered Jul 07 '21 at 19:14

This is a function that I wrote that answers the OP Title: Improvements welcome!

CREATE OR ALTER FUNCTION [dbo].[fn_DistinctWords]
(
  @String NVARCHAR(MAX)  
)
RETURNS NVARCHAR(MAX)
WITH SCHEMABINDING
AS
BEGIN
  DECLARE @Result NVARCHAR(MAX);
  WITH MY_CTE AS ( SELECT Distinct(value) FROM STRING_SPLIT(@String, ' ')  )
  SELECT @Result = STRING_AGG(value, ' ') FROM MY_CTE
  RETURN @Result
END
GO

Use like:

SELECT dbo.fn_DistinctWords('One Two      Three Two One');

score 6 · Answer 3 · answered May 29 '18 at 16:44

6

You can use distinct in the subquery used for the apply:

SELECT d.ProjectID,
       STRING_AGG(  newID.value, ',') WITHIN GROUP (ORDER BY newID.value) AS 
NewField
FROM [dbo].[Data] d CROSS APPLY
     (select distinct value
      from STRING_SPLIT(d.[bID], ';') AS newID 
     ) newID
WHERE newID.value IN (   'O95833' , 'Q96NY7-2'  ) 
group by projectid;

answered May 29 '18 at 16:44

Gordon Linoff

1,242,037
58
646
786

This is especially useful if you have more than one other column besides the one that needs to be split and aggregated. – Sander de Jong Jul 20 '22 at 13:46

score 3 · Answer 4 · answered May 29 '18 at 16:44

As @SeanLange pointed out in the comments, this is a terrible way to pull out the data, but if you had to, just make it 2 separate queries as follows:

SELECT 
    ProjectID
    ,STRING_AGG( val, ',') WITHIN GROUP (ORDER BY val) AS NewField
FROM
(
    SELECT DISTINCT 
        ProjectID
        ,newID.value AS val
    FROM 
        [dbo].[Data] WITH(NOLOCK)  
        CROSS APPLY STRING_SPLIT([bID],';') AS newID  
    WHERE 
        newID.value IN ('O95833' , 'Q96NY7-2') 
) t
GROUP BY
    ProjectID

That should do it.

score 3 · Answer 5 · answered Apr 12 '22 at 08:17

Here is my improvement on @ttugates to make it more generic:

CREATE OR ALTER FUNCTION [dbo].[fn_DistinctList]
(
  @String NVARCHAR(MAX),
  @Delimiter char(1)
)
RETURNS NVARCHAR(MAX)
WITH SCHEMABINDING
AS
BEGIN
  DECLARE @Result NVARCHAR(MAX);
  WITH MY_CTE AS ( SELECT Distinct(value) FROM STRING_SPLIT(@String, 
@Delimiter)  )
  SELECT @Result = STRING_AGG(value, @Delimiter) FROM MY_CTE
  RETURN @Result
END

score 2 · Answer 6 · answered Feb 05 '21 at 11:20

Another possibility to get unique strings from STRING_AGG would be to perform these three steps after fetching the comma separated string:

Split the string (STRING_SPLIT)
Select DISTINCT from the splits
Apply STRING_AGG again to a select with a group on a single key

Example:

(select STRING_AGG(CAST(value as VARCHAR(MAX)), ',') 
        from (SELECT distinct 1 single_key, value 
            FROM STRING_SPLIT(STRING_AGG(CAST(customer_division as VARCHAR(MAX)), ','), ',')) 
                q group by single_key) as customer_division

score 0 · Answer 7 · answered Dec 15 '20 at 16:50

You can make a distinct view of the table, that holds the aggregate values, that is even simpler:

Create Table Test (field1 varchar(1), field2 varchar(1));

go

Create View DistinctTest as (Select distinct field1, field2 from test group by field1,field2);

go

insert into Test Select 'A', '1';
insert into Test Select 'A', '2';
insert into Test Select 'A', '2';
insert into Test Select 'A', '2';
insert into Test Select 'D', '1';
insert into Test Select 'D', '1';

select string_agg(field1, ',')  from Test where field2 = '1';  /* duplicates: A,D,D */;

select string_agg(field1, ',')  from DistinctTest where field2 = '1';  /* no duplicates: A,D  */;

score -5 · Answer 8 · answered Jul 28 '21 at 07:35

-5

Oracle (since version 19c) suports listagg (DISTINCT ..., but Microsoft SQL Server not probably.

answered Jul 28 '21 at 07:35

sasynkamil

859
2
12
23

3

OP was not asking about Oracle – Daniel L. VanDenBosch Sep 08 '21 at 17:45

Get unique values using STRING_AGG in SQL Server

8 Answers8

Linked

Related