How to simulate STRING_AGG (contrary of split) function

Question

How to simulate string_agg function ?

I need get this

[value]
1
2
3

into this

1,2,3

I tried following

CREATE TYPE stringArray AS TABLE ([value] nvarchar(255))
GO

CREATE FUNCTION dbo.ufn_join 
(
    @table stringArray readonly, 
    @separator nvarchar(5) = ','
)
    RETURNS nvarchar(max)
AS
BEGIN
    RETURN stuff((select @separator + value from @table for xml path('')), 1, 1, '')
END
GO

SELECT dbo.ufn_join(
    (
    SELECT cast(1 as nvarchar(255)) as value
    UNION
    SELECT cast(2 as nvarchar(255)) as value
    UNION
    SELECT cast(3 as nvarchar(255)) as value
    )
    , DEFAULT
)

but I am getting an error

-- Error: Operand type clash: nvarchar is incompatible with stringArray

Only condition is that i do not want to use any kind of variables. CLR function is also totally fine, but there i have the same issue, how to insert return of select as a parameter to the function.

You can't pass TVPs inline. `DECLARE @t stringArray; INSERT @t VALUES ...`. If that syntax is not to your liking, this approach is simply dead in the water. T-SQL does not believe in syntactic sugar. — Jeroen Mostert, Sep 14 '17 at 12:02
There are a lot of duplicate questions, that end up pointing to [the same article by Aaron Bertrand](https://sqlperformance.com/2014/08/t-sql-queries/sql-server-grouped-concatenation). The XML option is described there correctly. You *can't* convert it into a function though, or pass some table name to it. SQLCLR is the fastest option — Panagiotis Kanavos, Sep 14 '17 at 12:03
Possible duplicate of [Simulating group\_concat MySQL function in Microsoft SQL Server 2005?](https://stackoverflow.com/questions/451415/simulating-group-concat-mysql-function-in-microsoft-sql-server-2005) — TheGameiswar, Sep 14 '17 at 12:21
@TheGameiswar I agree that the solution is identical but the problem is different (kinda). STRING_AGG is a SQL Server 2017 function, GROUP_CONCAT is MYSQL. Few people doing a google search for alternatives to STRING_AGG will find and/or benefit from the article you posted. That's my $0.02 — Alan Burstein, Sep 14 '17 at 17:16
@AlanBurstein it's the same issue. Some people searched for `group_concat` some for `string aggregation`. SO questions and answers though have to be specific. All thoses solutions are covered in Aaron Bertrand's article, with explanations, pros/cons and performance benchmarks. In fact, I'd bet the answers in the linked question come from Bertrand's article either directly or indirectly — Panagiotis Kanavos, Sep 15 '17 at 07:13

score 2 · Accepted Answer · edited Jun 15 '21 at 15:47

2

Normally I use this link when I want to concat rows. There are several options how to do it, so here you can find inspiration on which approach you like the most. Be aware of XML PATH since it uses all of your CPU Processes and can max out your CPU to 100%.

Different concat approaches

Example from the link:

CREATE FUNCTION dbo.udf_select_concat ( @c INT )
RETURNS VARCHAR(MAX) AS BEGIN
DECLARE @p VARCHAR(MAX) ;
       SET @p = '' ;
    SELECT @p = @p + ProductName + ','
      FROM Northwind..Products
     WHERE CategoryId = @c ;
RETURN @p
END


SELECT CategoryId, dbo.udf_select_concat( CategoryId )
FROM Northwind..Products
GROUP BY CategoryId ;

edited Jun 15 '21 at 15:47

MackM

2,906
5
31
45

answered Sep 14 '17 at 12:39

SqlKindaGuy

3,501
2
12
29

1

On the contrary, XML PATH is the *fastest* method after SQLCLR. What you posted here is the quirky update which takes *twice* as long as XML. It doesn't `uses all of your CPU Processes` - there's no such thing. It *doesn't* execute in parallel. It *does* run on a single thread, just like the quirky update. It's twice as fast though, which means the quirky update actually takes more CPU time to complete – Panagiotis Kanavos Sep 14 '17 at 14:04
That is **not** a "quirky update"... A quirky update is something else entirely. And, no, XML is not faster than a real quirky update. In proper setting, nothing is faster than a quirky update. The only problem with quirky updates is that they rely on undocumented behavior (relying on the order of a clustered index w/o an ORDER BY clause), meaning that there is no guarantee that it will yield correct results. – Jason A. Long Sep 14 '17 at 15:35
2

A scalar udf with varchar(max) is going to be slow and will never enjoy a parallel execution plan. This approach has other shortcomings too; note Martin Smith's comments in this post: https://stackoverflow.com/questions/15138593/nvarchar-concatenation-index-nvarcharmax-inexplicable-behavior. The XML approach is a hack but it's efficient. Also, regarding "XML Path being the fastest" -- there are two XML path methods: one which protects against reserved XML characters (which includes PATH and VALUE) and the one the OP posted which does not. The former is 1/2 as fast as the latter. – Alan Burstein Sep 14 '17 at 17:47
@PanagiotisKanavos My example, was not that it was faster, but just to paste something from the link. Furthermore i can promise you, that XML PATH can make the CPU to max out to 100%. I have tested this myself on larger datasets. However with that in mention, i also tend to use XML PATH myself on smaller datasets. – SqlKindaGuy Sep 15 '17 at 06:22
@plaidDK I don't even have to promise that `XML PATH can make the CPU to max out to 100%` is wrong, because it doesn't make sense - the operator isn't multithreaded. The benchmarks are there, from one of the best SQL Server MVPs and authors. Probably the *teacher* of the author of the article you linked to. Which means, there's a problem with the informal way you used to measure performance. As for `was not that it was faster`, think about it. A single core operation is **ALWAYS** going to use the entire CPU **CORE** until it completes. Slower does mean more CPU – Panagiotis Kanavos Sep 15 '17 at 07:08
@plaidDK or it could be that the quirky update in the way it was used was so bad, that its thread was getting swapped out constantly so couldn't use the CPU efficiently. Anyway, if you want to compare cost/perf post the *execution plan* and the IO statistics. Not assumptions about the CPU or Task Manager screenshots. Database performance is affected primarily by IO, not CPU – Panagiotis Kanavos Sep 15 '17 at 07:16
@plaidDK or it could be that there was so much data that the server was *able to parallelize the entire query* when XML PATH was used, but was unable to do so with the quirky update. That's a big *plus* for XML PATH. – Panagiotis Kanavos Sep 15 '17 at 07:18

score 2 · Answer 2 · answered Sep 14 '17 at 17:52

TVP issue aside, your function will be profoundly faster and more efficient by turning it into an inline table valued function (commonly referred as an inline scalar function (iSF)). This article explains what I'm saying in detail: How to Make Scalar UDFs Run Faster (SQL Spackle)

CREATE FUNCTION dbo.ufn_join (@separator nvarchar(5))
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT concatinatedTxt = 
  stuff((select @separator + someTxt from dbo.someTable for xml path('')), 1, 1, '');

score 1 · Answer 3 · answered Sep 14 '17 at 13:03

It's because you declared a type, set that type to a parameter and the tried to insert a table into this parameter (a different type).

Try this:

CREATE TYPE stringArray AS TABLE ([value] nvarchar(255))
GO

CREATE FUNCTION dbo.ufn_join 
(
    @table stringArray readonly, 
    @separator nvarchar(5) = ','
)
    RETURNS nvarchar(max)
AS
BEGIN
    RETURN stuff((select @separator + value from @table for xml path('')), 1, 1, '')
END
GO

DECLARE @table stringArray

INSERT INTO @Table
SELECT cast(1 as nvarchar(255)) as value
UNION
SELECT cast(2 as nvarchar(255)) as value
UNION
SELECT cast(3 as nvarchar(255)) as value

SELECT dbo.ufn_join(
    @Table
    , DEFAULT
)

How to simulate STRING_AGG (contrary of split) function

3 Answers3