Make string_agg() return unique values only

Question

I am working in SQL Server 2017 and I have the following two tables:

create table Computer (
  Id int Identity(1, 1) not null,
  Name varchar(100) not null,
  constraint pk_computer primary key (Id)
);

create table HardDisk (
  Id int Identity(1, 1) not null,
  Interface varchar(100) not null,
  ComputerId int not null,
  constraint pk_harddisk primary key (Id),
  constraint fk_computer_harddisk foreign key (ComputerId) references Computer(Id)
);

I have data such as:

Query

My current query is the following:

-- select query
select c.Id as computer_id,
       string_agg(cast(hd.Interface as nvarchar(max)), ' ') as hard_disk_interfaces
from Computer c
left join HardDisk hd on c.Id = hd.ComputerId
group by c.Id;

This gets me the following:

 computer_id | hard_disk_interfaces
-------------+----------------------
 1           | SATA SAS
 2           | SATA SAS SAS SAS SATA

However, I only want the distinct values, I'd like to end up with:

 computer_id | hard_disk_interfaces
-------------+----------------------
 1           | SATA SAS
 2           | SATA SAS

I tried to put distinct in front of the string_agg, but that didn't work.

Here's a db-fiddle.

Incorrect syntax near the keyword 'distinct'.

Does this answer your question? [Produce DISTINCT values in STRING_AGG](https://stackoverflow.com/q/51646385/11683) — GSerg, Nov 10 '21 at 19:23
I saw both, but didn’t really understand how I can apply them. — J86, Nov 10 '21 at 19:26

eshirvana · Answer 1 · 2021-11-10T19:54:21.827

1

string_agg is missing that feature , so you have to prepare the distinct list you want then aggregate them :

select id , string_agg(interface,' ') hard_disk_interfaces
from (
   select distinct c.id, interface
   from Computer c
   left join HardDisk hd on c.Id = hd.ComputerId
) t group by id

for your original query :

select *
from ....
join (
 <query above> ) as temp 
...
group by ... , hard_disk_interfaces

edited Nov 10 '21 at 19:54

answered Nov 10 '21 at 19:26

eshirvana

23,227
3
22
38

Thank you @eshirvana, although this works for my simplified scenario, when I actually try to make it work in my [actual query](https://gist.github.com/JwanKhalaf/ec2933fb34fff1a738702cc9e746e34e), it doesn't seem to do it. I still get duplicates. Look for `temp`, I've tried it within a `inner join` as it has to fit among all the other joins. – J86 Nov 10 '21 at 19:46
@J86 It looks like you query could do with a good dose of window functions and `OUTER APPLY`, which would cut down the size and speed it up quite a bit, as well as probably getting rid of duplicates If you create a https://dbfiddle.uk with some actual data we can see what could be done – Charlieface Nov 10 '21 at 20:00
Are window functions supported on `Microsoft SQL Server 2017 (RTM-GDR) (KB4583456) - 14.0.2037.2 (X64) Nov 2 2020 19:19:59 Copyright (C) 2017 Microsoft Corporation Standard Edition (64-bit) on Windows Server 2019 Standard 10.0 (Build 17763: )` I can try creating a fiddle with some data, it'll take a while cause there are soo many tables – J86 Nov 10 '21 at 20:14
@J86 yes they are , sure prepare a fiddle – eshirvana Nov 11 '21 at 21:50

score 0 · Answer 2 · answered Nov 10 '21 at 19:52

A couple of other ways:

;WITH cte AS 
(
  SELECT c.Id, Interface = CONVERT(varchar(max), hd.Interface)
    FROM dbo.Computer AS c
    LEFT OUTER JOIN dbo.HardDisk AS hd ON c.Id = hd.ComputerId
    GROUP BY c.Id, hd.Interface
)
SELECT Id, STRING_AGG(Interface, ' ')
FROM cte
GROUP BY Id;

or

SELECT c.Id, STRING_AGG(x.Interface, ' ')
FROM dbo.Computer AS c
OUTER APPLY
(
  SELECT Interface = CONVERT(varchar(max), Interface) 
    FROM dbo.HardDisk WHERE ComputerID = c.Id
    GROUP BY Interface
) AS x
GROUP BY c.Id;

Example db<>fiddle

If you are getting duplicates in a larger query with more joins, I would argue those duplicates are not duplicates coming out of STRING_AGG(), but rather duplicate rows coming from one or more of your 47 joins, not from this portion of the query. And I would guess that you still get those duplicates even if you leave out this join altogether.

Make string_agg() return unique values only

2 Answers2