I have a column that I am generating using the STRING_AGG
function.
STRING_AGG(CONVERT(NVARCHAR(MAX), ISNULL(T1.id,'N/A')), ',') AS old_id_list
This returns a list of all the values in group by aggregrated into a list.
My original table looks like this:
new_id | old_id | amount |
---|---|---|
a | 1 | 10 |
a | 1 | 20 |
a | 2 | 30 |
a | 2 | 40 |
a | 3 | 50 |
On applying the above string_agg
call, I get an output like this:
new_id | old_id_list | amount_total |
---|---|---|
a | 1,1,2,2,3 | 150 |
But I want to remove the repeated id while not disturbing the total amount computed column.
Expected output:
new_id | old_id_list | amount_total |
---|---|---|
a | 1,2,3 | 150 |
Things I have found over the internet were using distinct and ARRAY_AGG
function but SQL Server does not have ARRAY_AGG
function. I cannot remove the repeated old_id
before the string_agg()
as it will change the total amount computation.
I tried to insert distinct keyword into the string_agg
function but it didn't work either.
'ARRAY_AGG' is not a recognized built-in function name.
TLDR: I am trying to implement collect_set() functionality from pyspark in SQL Server.
https://spark.apache.org/docs/3.2.0/api/python/reference/api/pyspark.sql.functions.collect_set.html
I'm using SQL Server 2019 (v15.0.2095.3)