3

I've always found the lack of a built-in aggregate function that just strings together values using some user defined separator, frustrating.

The closest thing I know of is the XML hack:

select s.string as [text()] 
from strings s 
for xml path('')

But in my opinion that's a very strange way of doing it and you still have to handle the case with trailing or leading separator junk. So, I thought I'd write a CLR aggregate:

select dbo.Fold(s.string, ', ') 
from strings s

That's kind of nice. Except this thing doesn't perform very well when the number of rows reaches 5000. I don't really expect to run it against that much data but to my surprise the performance of the aggregate degrades rather rapidly as the data set grows. I didn't understand why at first, then I realized that the way the UDF is handled in SQL Server is the problem.

I'm using an UDF with a StringBuilder class, because of this I have to implement IBinarySerialize. But for reasons I do not understand SQL Server is calling the Write/Read pair between each iteration of Accumulate. Obviously this creates a performance problem when the string gets big. Is there anyway to avoid this costly serialization?

FYI, the XML hack are orders of magnitudes faster.

John Leidegren
  • 59,920
  • 20
  • 131
  • 152

1 Answers1

1

What you are doing is implementing the MySQL GROUP_CONCAT

See these for inspiration...

Personally, I use the XML trick.

Community
  • 1
  • 1
gbn
  • 422,506
  • 82
  • 585
  • 676
  • The XML trick appears to be the right choice in this matter, it's just a horrible syntax. The group_concat stuff maintains a frequency table while running the aggregate, I'm guessing it performance is just as bad but it's an interesting approach. – John Leidegren Jun 13 '11 at 11:02