3

I am trying to concatenate LoadIds for a user using For XML path which is a portion of my whole query, I verified that the maximum time gets elapsed in calculating this concatenated LoadId column. Below is the syntax, can anyone suggest a way to rewrite this efficiently?

SELECT 
    Col1, col2, 
    LoadIds = STUFF((SELECT ' , ' + CAST([LoadId] AS varchar(5))
                     FROM Table1 AS t1
                     WHERE t1.[UserId] = [t2].[UserId]
                     FOR XML PATH ('')), 1, 2, '')
FROM 
    Table1 AS t2
GROUP BY 
    [UserId]
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Naina
  • 31
  • 1
  • 2
  • What exactly about the performance is a problem for you? This looks like a very typical `FOR XML PATH` query in SQL Server. Do you have an index on `UserId`? – Tim Biegeleisen Apr 17 '18 at 01:30
  • @TimBiegeleisen If I comment calculation of LoadIds, performance of the query is within 20 seconds and with LoadIds it goes upto a minute. I don't have index on UserId and I don't have permission to create – Naina Apr 17 '18 at 01:46
  • @Naina... Is there any typo in posted query ? cause i see only `userid` in `group by` clause but the `select` statement has nos of `expression`. – Yogesh Sharma Apr 17 '18 at 04:16
  • If you don't have the permission to create an index, this will not help you probably, but there is the chance to use a [CLR function](https://stackoverflow.com/q/48783205/5089204). – Shnugo Apr 17 '18 at 07:20
  • @Naina, this one is also a good one for SQL CLR [GROUP_CONCAT](https://github.com/orlando-colamatteo/ms-sql-server-group-concat-sqlclr) – Biju jose Apr 17 '18 at 07:39

2 Answers2

2

1). I would try to collect the required data in a temp table or table variable, create an index on it and then play with the concatination.

2). FOR XML PATH works good for small sets of records, for large sets I would try a recursion.

declare @T table (
    UserId     int        not null,
    RowNumber  int        not null,
    LoadId     varchar(5) not null

    primary key clustered (UserId, RowNumber)
);

insert into @T
select
    UserId,
    row_number() over(partition by UserId order by LoadId),
    CAST(LoadId AS varchar(5))    
from
    Table1 ;


with cte (UserId, RowNumber, LoadIds) as 
(
    select 
        UserId, 
        RowNumber,
        LoadIds = convert(varchar(8000), LoadId)
    from @T
    where RowNumber = 1

    union all

    select 
        t.UserId, 
        t.RowNumber,
        convert(varchar(8000), cte.LoadIds + ', ' + t.LoadId)
    from 
        cte inner join @T t on t.UserId = cte.UserId and t.RowNumber = cte.RowNumber + 1
)
select UserId, LoadIds = max(LoadIds) from cte group by UserId;
Vadim Loboda
  • 2,431
  • 27
  • 44
  • 3
    may be you should look this,https://sqlperformance.com/2014/08/t-sql-queries/sql-server-grouped-concatenation – Biju jose Apr 17 '18 at 07:41
  • But I've heard the opposite opinion and seen other results as well. That is why I wrote "I would try..." – Vadim Loboda Apr 17 '18 at 07:57
  • I usually keep some cheat sheets with me , like commonly used techniques to solve these type of problem , for grouped concatenation the best way to go is xml (some caveats are there which you should keep in mind) and CLR. A great read by late Dwain camps is here [High perf T SQL code patterns](https://dwaincsql.com/2015/05/27/high-performance-t-sql-using-code-patterns/) – Biju jose Apr 17 '18 at 07:58
  • even I tried r-cte some times until I came across with this. That's why pointed it out, now onwards you will keep an eye on r-cte :) – Biju jose Apr 17 '18 at 08:00
  • 1
    For others looking for a solution, I found that creating an index on the temp table as suggested by @VadimLoboda really sped up my query – RoyalSwish Jun 28 '21 at 13:21
0

One thing you can do is select the distinct user ids before doing the string aggregation:

SELECT t2.UserId, 
        STUFF((SELECT ' , ' + CAST([LoadId] AS varchar(8000))
               FROM Table1 t1
               WHERE t1.[UserId] = [t2].[UserId]
               FOR xml PATH ('')
              ), 1, 2, ''
             ) as LoadIds
FROM (SELECT DISTINCT userId
      FROM Table1 t2
     ) t2;

For performance, an index on table1(UserId, LoadId) would also help.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786