Concatenate one field after GROUP BY

Question

This question have been asked many times in SO but none of the answers is satisfying to my situation.

I am dealing with a DataObjectVersions table that contains multiple versions for around 1.2 million unique objects (and increasing). I need to concatenate changes from a specific field for each unique object.

Right now I am using the solution with the XML Path presented in Q3 but running such a query on this table is a total performance disaster. SQL Server started to retun Data after 19mn. Knowing that this data will be than joined twice, you can imagine the impact.

I am looking for the most efficient scalability-aware way to concatenate the values of the same fields of different rows grouped by an other field (which is not of course a key). To be more precise, this is used within a view in a Datawarehouse.

EDIT:

I tried to simplify the description but here is a complete overview I have multiple tables with the following columns

   [ID]
   [CreatedTime]
   [CreatedBy]
   [DeletedTime]
   [DeletedBy]
   [ResourceId]
   [AccountId]
   [Type]

A view is used to return the union of all records from all tables, which will still return the same columns (described in my questions by the versions table). [ResourceId] and [AccountId] are a unique composite identifier of an object (Group membership, System account, etc.. a resource assignment specifically). The [Type] is used to identify different levels (like Read/Write/Execute in the case of a file assignment)

All other fields contain the same values (in different tables) for different unique objects. I need to get the objects and concatenate the values of the [Type] column. All the row are processed afterward and the ([ResourceId],[AccountId]) combination must be unique (not the case when different types exists).

EDIT 2:

I am using this function:

CREATE FUNCTION [dbo].[GetUniqueType]
(
    @ResourceId as uniqueidentifier,
    @Account as uniqueidentifier
)
RETURNS nvarchar(100)
AS
BEGIN   
    return STUFF((select ',' + raType.Type from vwAllAssignments raType where raType.AccountId = @Account and raType.ResourceId = @ResourceId and raType.DeletedBy is null for xml path('')), 1,1,'')
END

GO

vwAllAssignments is the view returning the union of all tables rows.

Finally I am selecting

SELECT [CreatedTime]
      ,[DeletedTime]
      ,[DeletedBy]
      ,[ResourceId]
      ,[AccountId]
      ,dbo.GetUniqueType([ResourceId],[AccountId]) AS [Type]
FROM vwAllAssignments
GROUP BY [ResourceId], [AccountId], [CreatedTime], [DeletedTime], [DeletedBy]

Do you mean having something like "Object", "V1, V2, V3, V4"? — Diego, Nov 30 '12 at 14:47
Yes, but for a specific column only, not all columns like concatenating types or change dates from different versions — Moslem Ben Dhaou, Nov 30 '12 at 14:51
Please show your database layout. I use concatenation in SQL Server 8 on larger tables and get much better performance. I suspect some well placed indexes will solve the performance problem. — Gordon Linoff, Nov 30 '12 at 15:06
can you setup an sqlfiddle with some rows of your vwAllAssignments view? — Saic Siquot, Nov 30 '12 at 16:22
If the database is used for OLAP, which is heavily denormalized, could you store the pre-formatted data in a table, and then simply join with it? You could also consider creating an indexed view that returns the column, insteaf of using the function. Note: it depends on the circumstances, but, in my experience, concatenating values from multiple rows into a single column is often a presentation issue (for example, if you needed these values just to display them on a report) and, as such, it should not be delegated to the database. — Diego, Nov 30 '12 at 18:10
Have you tried to do the string concatenation directly in the field list instead of in a UDF? I did some testing and it is faster without the UDF. — Mikael Eriksson, Nov 30 '12 at 19:26
An index something like this will also be helpful. `create index IX_vwAllAssignments on vwAllAssignments(AccountId, ResourceId, DeletedBy) include(Type)` — Mikael Eriksson, Nov 30 '12 at 19:41
@MikaelEriksson: WHat do you mean by UDF? sorry. I will try the index as well — Moslem Ben Dhaou, Nov 30 '12 at 19:55
The UDF (User Defined Function) is `GetUniqueType` I will add an answer with some code you can try. — Mikael Eriksson, Nov 30 '12 at 20:00

score 4 · Accepted Answer · answered Nov 30 '12 at 20:02

Try this:

SELECT [CreatedTime]
      ,[DeletedTime]
      ,[DeletedBy]
      ,[ResourceId]
      ,[AccountId]
      ,STUFF((select ',' + raType.Type 
              from vwAllAssignments raType 
              where raType.AccountId = vwAllAssignments.AccountId and 
                    raType.ResourceId = vwAllAssignments.ResourceId and 
                    raType.DeletedBy is null 
              for xml path('')), 1,1,'') AS [Type]
FROM vwAllAssignments
GROUP BY [ResourceId], [AccountId], [CreatedTime], [DeletedTime], [DeletedBy]

And an index like this should be helpful.

create index IX_vwAllAssignments on vwAllAssignments(AccountId, ResourceId, DeletedBy) include(Type)

Concatenate one field after GROUP BY

1 Answers1

Linked

Related