12

Im using the code from this MSDN page to create a user defined aggregate to concatenate strings with group by's in SQL server. One of my requirements is that the order of the concatenated values are the same as in the query. For example:

Value   Group
1       1
2       1
3       2
4       2

Using query

SELECT
  dbo.Concat(tbl.Value) As Concat,
  tbl.Group
FROM
  (SELECT TOP 1000
     tblTest.*
  FROM 
    tblTest
  ORDER BY 
    tblTest.Value) As tbl
GROUP BY
  tbl.Group

Would result in:

Concat  Group
"1,2"   1
"3,4"   2

The result seems to always come out correct and as expected, but than I came across this page that states that the order is not guaranteed and that attribute SqlUserDefinedAggregateAttribute.IsInvariantToOrder is only reserved for future use.

So my question is: Is it correct to assume that the concatenated values in the string can end up in any order?
If that is the case then why does the example code on the MSDN page use the IsInvariantToOrder attribute?

Magnus
  • 45,362
  • 8
  • 80
  • 118
  • Can't answer your MS question, but `GROUP_CONCAT` allows you to be deterministic about the concatenation. – Kerrek SB Jul 06 '11 at 13:15
  • @kerrek - note this is tagged `SQL Server` not `MySQL` – JNK Jul 06 '11 at 13:19
  • I'm afraid there is no `GROUP_CONCAT` in SQL server 2008 – Magnus Jul 06 '11 at 13:20
  • @JNK: Oh, isn't that standard SQL? Never mind in that case! – Kerrek SB Jul 06 '11 at 13:20
  • 1
    If your order is entirely dependent on the values that are concatenated you could mange the order in the C# code. I'm no C# coder but mucked around with it a bit so it produced the reverse order using insert instead of append on the `stringbuilder`. I guess you could sort the string in `Terminate()` before it is returned. – Mikael Eriksson Jul 06 '11 at 13:55
  • Thanks for the comment @Mikael, but this is part of a View that is used by multiple other queries, so it would be very messy to change all those to do it in code instead. – Magnus Jul 06 '11 at 14:09
  • @Magnus – Hu? I mean that you could modify the Concat C# code you got from MSDN. No need to modify the queries or the views. – Mikael Eriksson Jul 06 '11 at 14:12
  • @Mikael ok, I see what you mean. But I do not always need to order by the values being concatenated, but by some other field in query. – Magnus Jul 06 '11 at 14:26

2 Answers2

5

I suspect a big problem here is your statement "the same as in the query" - however, your query never defines (and cannot define) an order by the things being aggregated (you can of course order the groups, by having a ORDER BY after the GROUP BY). Beyond that, I can only say that it is based purely on a set (rather than an ordered sequence), and that technically the order is indeed undefined.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • +1 - Search for `SQL Order` on SO and you will get a lot of posts about similar issues. – JNK Jul 06 '11 at 13:21
  • You can specify `order by` if you have `top` statement of an inner select and than group by on that. – Magnus Jul 06 '11 at 13:23
  • @Marc I've updated my question with a query that's using an inner select with `top` statement and `order by`. Will the end result of using group by and `Concat` on that still have an undefined order? – Magnus Jul 06 '11 at 13:34
  • 2
    @Magnus - all that that inner `ORDER BY` gives you is a sound definition for `TOP`. It *does not*, by any means, guarantee the order in which any outer operations will operate on the rows. Indeed, people used to use `TOP 100 PERCENT` and `ORDER BY` to "order" views - but then the optimizer was smartened to realize that it doesn't need to perform an ordering if 100% of the rows are required. If the same optimization hasn't occurred for inner queries yet, it may well do in the next release of SQL Server. – Damien_The_Unbeliever Jul 06 '11 at 13:34
  • @Marc, ok thanks. Strange thing is that it always does follow the inner order by in my test. – Magnus Jul 06 '11 at 13:40
  • 2
    If they implement that "optimization" in the next version maybe they will also implement `IsInvariantToOrder` to fix it. – Magnus Jul 06 '11 at 13:43
  • @Damien_The_Unbeliever I believe that optimization has been implemented for inner queries now. The intra-group order no longer persists when I try cross applying a subquery with ORDER BY inside. – Asad Saeeduddin Feb 27 '14 at 21:03
0

While the accepted answer is correct, I wanted to share a workaround that others may find useful. Warning: it involves not using a user-defined aggregate at all :)

The link below describes an elegant way to build a concatenated, delimited list using only a SELECT statement and a varchar variable. The upside (for this thread) is that you can specify the order in which the rows are processed. The downside is that you can't easily concatenate across many different subsets of rows without painful iteration.

Not perfect, but for my use case was a good workaround.

http://blog.sqlauthority.com/2008/06/04/sql-server-create-a-comma-delimited-list-using-select-clause-from-table-column/

Mike Monteiro
  • 1,427
  • 1
  • 14
  • 21
  • Although that can not be used with a group by as in my question. – Magnus Feb 12 '14 at 08:19
  • Yep, exactly - that was what I was trying to get at by saying "the downside is that..." – Mike Monteiro Feb 12 '14 at 13:49
  • How would you specify an order in this case? If this is inside a view or function, it is still impossible to specify an order, making this useless. – Asad Saeeduddin Feb 27 '14 at 21:08
  • To specify an order in which you process the rows for the delimited list, you would do: `DECLARE @listStr VARCHAR(MAX) SELECT @listStr = COALESCE(@listStr+',' ,'') + Name FROM Production.Product ORDER BY Name -- or whatever`. You are correct that this involves using procedural code (a variable declaration), which means it can't be used inside a view or function. But it can be used inside a SP. I would argue that this workaround is not "useless" just because it isn't relevant for your specific situation. – Mike Monteiro Feb 27 '14 at 21:12
  • @MikeMonteiro The problem here would be the `ORDER BY` clause, not the variable declaration, which AFAIK is OK in a scalar function. The reason I said this is useless is because there is no way to use this in subqueries, functions, or views, unless you skip ordering of the result, which is the whole point of the question. Trying to use this with a sproc never occurred to me. – Asad Saeeduddin Feb 28 '14 at 03:41