How concatenating the rows works

Question

Can someone please clarify how variables are executing here?

In query 1 how are all the rows concatenated with a comma? There is no while loop there.

In query 2 I assign empty strings directly to the query but it shows different results. Can someone explain this?

Table data:

select name from names

Output:

s
r
i
n
u

Query 1:

declare @var varchar(20)
set @var=''
select @var=@var+name+',' from names
select @var

Output:

s,r,i,n,u,

Query 2:

declare @var varchar(20)
select @var=''+name+',' from names
select @var

Output:

u,

FYI, that type of query with the syntax `SELECT @Variable = @Variable + ... FROM` is a [documented antipattern](https://learn.microsoft.com/en-us/sql/t-sql/language-elements/select-local-variable-transact-sql?view=sql-server-ver15#c-antipattern-use-of-recursive-variable-assignment) and should be avoided. Instead use string aggregation to achieve the same results. In recent versions of SQL Server that would be by using `STRING_AGG`, however, in older versions you'll need to use `FOR XML PATH` (and `STUFF`) to achieve the same results. — Thom A, Jul 13 '22 at 08:06
To answer the actual question, the `@var=@var` in the first query makes it effectively recursive. Remember that SQL operations are set-based, so the same SELECT occurs for each row in the result set, therefore it keeps assigning more to `@var` until it gets to the end of the table. But as that documentation link says, this is definitely not guaranteed by the SQL Server engine - it may process the variable assignments in an unexpected order or may even skip it sometimes, so use it at your own risk. — ADyson, Jul 13 '22 at 08:15
@srini what you posted *doesn't* really work. I suspect you copied it from MySQL where it doesn't really work either and was a hack used before the introduction of `GROUP_CONCAT`. What you posted requires the query engine to retrieve and process rows in a loop, the worst possible case. Databases don't use loops, they use sets, hashes and parallelization. MySQL allows such hacks because it didn't have the relevant functionality for years (decades even) so by the time they added more intelligent query processing and speedups, they couldn't abandon the hacks without breaking everyone's code — Panagiotis Kanavos, Jul 13 '22 at 08:32
@srini it's worth noting that MySQL doesn't really support those hacks (non-aggregate columns in GROUP BY queries is another case), they tolerate them. They do warn these can be removed in the future. They aren't tested either, which means even a minor release can dramatically reduce the performance of hacks. I think 5.7 caused a huge perf degradation to the GROUP BY case. — Panagiotis Kanavos, Jul 13 '22 at 08:35

score 4 · Answer 1 · answered Jul 13 '22 at 08:17

As I mentioned in the comment, the logic you have in your query is a documented antipattern. Effectively you are relying on that query is row in a row by row basis, and for each row the variable is updated.

So, you are hoping, that the variable is first set to the value '' + 's,' (which is 's,'), then for the second row, the prior rows value of variable would be used ('s,') and concatenated to the next ('r'), resulting in 's,r,'. For the third row, again use the prior rows value of variable ('s,r,') and concatenate it to the next ('i'), resulting in 's,r,i'. Repeat until you get to the end of the dataset.

Per the documentation, however, there is no guarantee that'll actually happen though:

In this case, it is not guaranteed that @Var would be updated on a row by row basis. For example, @Var may be set to initial value of @Var for all rows. This is because the order and frequency in which the assignments are processed is nondeterminant. This applies to expressions containing variables string concatenation, as demonstrated below, but also to expressions with non-string variables or += style operators. Use aggregation functions instead for a set-based operation instead of a row-by-row operation.

So this means you could simply end up with a single delimited value like 'u,', or perhaps some missing values (maybe 'n,u,') due to when the rows and variable assignments were processed.

Instead, as the documentation also states, use string aggregation. On all (fully) support versions of SQL Server, that would be STRING_AGG:

SELECT @var = STRING_AGG(name,',')
FROM dbo.Names;

If you are on an older version of SQL Server, then you would need to use the "old" FOR XML PATH (with STUFF) method, like shown in this question.

I bet this comes from an attempt to use MySQL hacks in SQL Server. These don't work well in MySQL either, they're just accidents that can't be solved without breaking a ton of badly written web site code. — Panagiotis Kanavos, Jul 13 '22 at 08:38
I've seen plenty of examples of people using the anti-pattern in SQL Server as well unfortunately, @PanagiotisKanavos . I'd *suggest* that the main culprit I see it are dynamic pivots (maybe due to [this answer](https://stackoverflow.com/a/38505375/2029983)). — Thom A, Jul 13 '22 at 08:41
@PanagiotisKanavos See also https://stackoverflow.com/a/70491646/14868997 and https://stackoverflow.com/questions/15138593/nvarchar-concatenation-index-nvarcharmax-inexplicable-behavior/15163136#15163136 and https://dba.stackexchange.com/a/132709/220697 — Charlieface, Jul 13 '22 at 09:07

How concatenating the rows works

1 Answers1