1

I need to group a table by a set of values together with all matching row numbers/id:s for each set. This operation must be done within the boundaries of SQL Server 2016.

Let's suppose I have the following table (Places):

ID Country City
1 Sweden Stockholm
2 Norway Oslo
3 Iceland Reykjavik
4 Sweden Stockholm

The result that I'm after (No curly-brackets because Stack Overflow thinks it's code, preventing me from posting):

ID Json
1,4 "Country":"Sweden","City":"Stockholm"
2 "Country":"Norway ","City":"Oslo"
3 "Country":"Iceland ","City":"Reykjavik"

In SQL Server 2017 the above result can be achieved with:

SELECT STRING_AGG(ID) ID, (SELECT Country, City FOR JSON PATH) Json
FROM Places GROUP BY Country, City

I managed to get a similar result in SQL Server 2016 with the code below. (But with my actual amount of data and columns, this solution is too slow.)

SELECT DISTINCT Country, City INTO #temp FROM Places
SELECT (SELECT ID From Places WHERE Country = P.Country AND City = P.City FOR JSON PATH) ID, 
(SELECT Country, City FOR JSON Path) Json FROM #temp P

Is there any more performance-effective way of achieving the result that I'm after?

EDIT: As people suggested me to try "FOR XML Path" I tried the code below. This gives the following error "Places.ID is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause ":

SELECT stuff((select ',' + cast(ID as varchar(max)) for xml path ('')), 1, 1, '') ID, 
(SELECT Country, City FOR JSON PATH) Json
FROM Places GROUP BY Country, City
Hwende
  • 599
  • 2
  • 10
  • 25
  • There is no SQL Server 2013. Please issue `SELECT @@VERSION;` and update the question. – Yitzhak Khabinsky Jun 24 '21 at 15:35
  • While asking a question you need to provide a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example): (1) DDL and sample data population, i.e. CREATE table(s) plus INSERT, T-SQL statements. (2) What you need to do, i.e. logic, and your attempt implementation of it in T-SQL. (3) Desired output based on the sample data in the #1 above. (4) Your SQL Server version (SELECT @@version;) – Yitzhak Khabinsky Jun 24 '21 at 15:36
  • @Yitzhak Khabinsky thank you, I've updated the post with the correct version. – Hwende Jun 24 '21 at 15:37
  • Loads of resource on this if you search a little https://www.codeproject.com/articles/691102/string-aggregation-in-the-world-of-sql-server - https://duckduckgo.com/?q=string+agg+vs+xml+path+performance&t=ffab&ia=web – Caius Jard Jun 24 '21 at 15:37
  • 1
    Also, looks like you actually want something more like JSON output, try https://www.red-gate.com/simple-talk/sql/learn-sql-server/json-support-in-sql-server-2016/ – Caius Jard Jun 24 '21 at 15:39
  • Dose this answer your question? [string_agg for sql server pre 2017](https://stackoverflow.com/questions/49361088/string-agg-for-sql-server-pre-2017) – Thom A Jun 24 '21 at 15:40
  • Google 'sql server for xml path' - lots of examples for exactly what you want. – Stu Jun 24 '21 at 15:43
  • @Caius Jard Thanks, will give it a read. But generally, it looks difficult to compare and translate it into my use case since he isn't grouping the result in a similar way. – Hwende Jun 24 '21 at 15:48
  • @Larnu No, unfortunately not. Using that code to concatenate the ID:s while only grouping the result based on the other columns will just yield an error saying that the ID column must be added to the GROUP BY clause. – Hwende Jun 24 '21 at 15:55
  • 1
    Then the implementation you did was incorrect, @Hwende . `FOR XML PATH`, and `STUFF` to remove the leading delimiter, *is* the pre-2017 solution. – Thom A Jun 24 '21 at 15:56
  • @Larnu I have added the implementation to the post. Can you tell what I'm doing wrong? – Hwende Jun 24 '21 at 16:04
  • 1
    There's no correlated subquery in that attempt, @Hwende . That's why. You need to use a correlated subquery when you are grouping. – Thom A Jun 24 '21 at 16:06
  • FOR XML PATH is unlikely to be hugely different to FOR JSON PATH, performance wise; it's essentially a kooky way of turning rows into CSV by turning it into an xml doc, and yanking all the xml tags in favour of commas so `123` becomes `1,2,3`.. i.e. more work to come round to a json flavored result than using JSON output ;) – Caius Jard Jun 24 '21 at 16:55

2 Answers2

2

Here's a solution you can try with for xml path

Basically select and group the json columns needed and using an apply, use the for xml path solution to aggregate the correlated ID values; because the outer query needs to refer to the output of the apply it needs to be aggregated also, I chose to use max

select max(x.Ids), (select country,city for json path) as [Json]
from t
outer apply (
    select Stuff((select ',' + Convert(varchar(10),t2.Id)
    from t t2
    where t2.city=t.city and t2.country=t.country
    for xml path(''),type).value('(./text())[1]','varchar(10)'),1,1,'') as Ids
)x
group by country,city

Working Fiddle

Stu
  • 30,392
  • 6
  • 14
  • 33
  • Thank you for the solution. It does work just as stated. Learned a thing or two from this. And this is probably the fastest way of achieving my desired result pre SQL Server 2017, right? However, unfortunately for me, it doesn't seem to perform any faster than the working code example in my post. – Hwende Jun 24 '21 at 16:37
  • 1
    I believe so yes - using for xml in this fashion is pretty much the go-to method. For smallish data sets it's acceptable; I did some performance testing not long ago when moving from 2016 to 2019, the string_agg() function is significantly more performant as the number of rows/aggregations scales. XML is just not performant unfortunately. – Stu Jun 24 '21 at 16:40
0

Here is another possible solution:

Declare @testTable Table (ID int, Country varchar(30), City varchar(30));
 Insert Into @testTable (ID, Country, City)
 Values (1, 'Sweden', 'Stockholm')
      , (2, 'Normway', 'Oslo')
      , (3, 'Iceland', 'Reykjavik')
      , (4, 'Sweden', 'Stockholm');

 Select Distinct
        ID = stuff((Select concat(',', tt2.ID)
                      From @testTable tt2
                     Where tt2.City = tt.City
                       And tt2.Country = tt.Country
                       For xml path (''), Type).value('.', 'varchar(10)'), 1, 1, '')
      , json = (Select Country, City For JSON PATH) 
   From @testTable      tt;

No idea if this will perform any better though. It is essentially the same - just using DISTINCT instead of GROUP BY.

Jeff
  • 512
  • 2
  • 8