0

I have a bit of an issue. I have a dataset with around 6000 records stored in a MSSQL 2016 database, where I need to concatenate each column, sorted on row basis and grouped by an ID.

It is no problem to sort the data within the column itself, but the position of the concatenated values is the key in the system that needs the data. So, if I merge four rows with the same ID, each row value must be in the same position throughout the whole resultset row.

To make it even more messy, I must be able to apply some logic to determine which row should be in position one.

If I use the the sort functionality in GROUP_CONCAT_DS (I'm using a semicolon delimiter), it sorts on the column values, which make the index position of each merge inconsistent related to the rows that holds the data.

Example data:

id  col1  col2
--------------------
1  foo    bar
1  lorem  ipsum
2  hi     cowboy

Expected output:

id, concat_col1, concat_col2
----------------------------
1  foo;lorem     bar;ipsum
2  hi            cowboy

Absolutely unacceptable output:

id  concat_col1  concat_col2
----------------------------
1   bar;foo      lorem;ipsum
2   hi           cowboy

In pseudo-SQL, this is a simplified example of what I want to achieve:

SELECT
    [id]
    ,dbo.GROUP_CONCAT_D([col1], ';') AS [concat_col1]
    ,dbo.GROUP_CONCAT_D([col2], ';') AS [concat_col2]
  FROM [hrm].[vEmploymentsPositions]
  GROUP BY [id]
  ORDER BY [id] asc, [col1] desc, [col2] asc

... But for loical reason the sort doesn't work that way.

How can I approach this issue and make a robust solution that fit our needs?

  • SQL Server's equivilent of `GROUP_CONCAT` is [`STRING_AGG`](https://learn.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql?view=sql-server-ver15). – Thom A Apr 02 '20 at 09:42
  • Do you have always 2 columns? – Zhorov Apr 02 '20 at 09:44
  • @Larnu Wasn't that introduced in SQL Server 2017? And no, I don't store delimited data, I need to export delimited data to an external system. – swaglord mcmuffin' Apr 02 '20 at 09:47
  • @Zhorov The number of columns is fixed, but the amount of rows fitted in the GROUP BY varies – swaglord mcmuffin' Apr 02 '20 at 09:48
  • @swaglordmcmuffin' the fact you used commas (`,`) to delimit your sample data in the question infered you had delimited data in the database. – Thom A Apr 02 '20 at 09:49
  • @Larnu Ah, sorry. I noticed an edit on it now. Thanks for the feedback! – swaglord mcmuffin' Apr 02 '20 at 09:52
  • @swaglordmcmuffin' there are a *lot* of duplicates that show the techniques you can use when `STRING_AGG` isn't available. In the end, all of them come from Aaron Bertrands articles on [Grouped Concatenation](https://sqlperformance.com/2014/08/t-sql-queries/sql-server-grouped-concatenation), [ordering](https://sqlperformance.com/2014/08/t-sql-queries/sql-server-grouped-concatenation-2) and a list of [practical uses](https://www.sentryone.com/blog/aaronbertrand/grouped-concatenation-use-cases). The SQLCLR version is the fastest, followed by the XML technique. – Panagiotis Kanavos Apr 02 '20 at 09:55
  • @PanagiotisKanavos Yeah, that's the one where I found GROUP_CONCAT_D, but for some reason it's not enough when it comes to determing which row that should be in position 0, but I'll re-read his articles again to see if I can get any new insights. In the other questions I've read I haven't been able to figure out how to sort the elemens which I concatenate, on row basis. But I'll give it another go! – swaglord mcmuffin' Apr 02 '20 at 09:59
  • @swaglordmcmuffin' the second article on ordering covers that. There are few cases where you need to aggregate on the *server* though, in most cases it's best done on the client – Panagiotis Kanavos Apr 02 '20 at 10:00
  • *"it's not enough when it comes to determing which row that should be in position 0"* There's nothing in your data to determine what order the data is in, that's why. Tables are stored in unordered heaps, so just beause you inserted a row "before" another doesn't mean its stored in that order, or that (without an explicit `ORDER BY`) the data will be returned in that order. – Thom A Apr 02 '20 at 10:07
  • @Larnu, yeah, I tried to explain it with my "psuedo-sql", or atleast try to describe what I need. Anyway, as Panagiotis says, it might be best to do it on the client. I might need to inform our integrations team and ask them to handle it in the integration/client side instead. Thanks alot guys! – swaglord mcmuffin' Apr 02 '20 at 10:17

0 Answers0