1

I am using SQL Server 2008.

I have a table of items and the categories they belong to.

-----------------------------------------------------------------
ItemId | Category1  | Category2  | Category3  | Category4
-----------------------------------------------------------------
1      | Val1, Val2 | Val3, Val4 | Val5, Val6 | Val7, Val8
1      | Val9       |            |            |     
1      | Val10      | Val11      |            |

*Edit1

There is no index on the table and ItemId is not the primary key. Each item might belong to more than one category.

I want to Select distinct ItemId from the table and have all possible Categories as comma separated for each column.

Sample Output -

------------------------------------------------------------------------------
ItemId | Category1               | Category2         | Category3  | Category4
------------------------------------------------------------------------------
1      | Val1, Val2, Val9, Val10 | Val3, Val4, Val11 | Val5, Val6 | Val7, Val8

I am able to achieve this using STUFF and FOR XML PATH using the below query. However, I had a look at the execution plan and it seemed like probably not the best way to do it. I am using 4 different SELECT statements for each of the categoryies. Can it all be merged into one? I am looking for the most efficient query to achieve the same result as the below query.

SELECT  DISTINCT
        t1.tblItemId,
        STUFF((
        SELECT DISTINCT ',' + t2.Category1
        FROM tblCategory t2
        WHERE t1.tblItemId = t2.tblItemId 
        FOR XML PATH ('')), 1, 1, '') AS Category1,
        STUFF((
        SELECT DISTINCT ',' + t2.Category2 
        FROM tblCategory t2
        WHERE t1.tblItemId = t2.tblItemId 
        FOR XML PATH ('')), 1, 1, '') AS Category2,        
        STUFF((
        SELECT DISTINCT ',' + t2.Category3 
        FROM tblCategory t2
        WHERE t1.tblItemId = t2.tblItemId 
        FOR XML PATH ('')), 1, 1, '') AS Category3,        
        STUFF((
        SELECT DISTINCT ',' + t2.Category4 
        FROM tblCategory t2
        WHERE t1.tblItemId = t2.tblItemId 
        FOR XML PATH ('')), 1, 1, '') AS Category4
FROM tblCategory t1
where t1.tblItemId IS NOT NULL

*Edit1 - Currently multiple values don't exist in each row delimited by commas. I have used them as it might be a possibility in the future and no one is restricting it. I am using a Common Table Expression to create a temp table with the desired output that can be used for my front end.

Vishwas
  • 1,398
  • 8
  • 20
  • 33
  • 1
    This query can be simplified using [query()](https://learn.microsoft.com/en-us/sql/t-sql/xml/query-method-xml-data-type) method but performance will be worse than your current approach.. – Pரதீப் Jul 03 '17 at 06:30
  • 4
    This is a terrible database design. Read [Is storing a delimited list in a database column really that bad?](http://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad), where you will see a lot of reasons why the answer to this question is **Absolutly yes!** – Zohar Peled Jul 03 '17 at 06:31
  • Hi Zohar, The values shown are for example purpose. Although comma separated values don't exist at the moment it might be a possibility. I am currently using a Common Table Expression of the above query. So in the database there are no comma delimited values. I am merely forming a table for use in my front end. – Vishwas Jul 03 '17 at 06:45
  • I was referring to the first table in your post... If that's not the raw data, please edit your question to show the raw data, preferably as DDL+DML so that we can use it to copy to a test environment. – Zohar Peled Jul 03 '17 at 07:16
  • The first table is a possible scenario for the raw data. So we can go with that. I know it is a bad design but I'm looking to see if anyone can help me optimize the query without altering the table. Thanks. – Vishwas Jul 03 '17 at 07:48
  • What about duplicate values, which would appear more than once in your list of values? Btw; You should **never store more than one value in one cell!!** In this case **never** really means **never ever!**. In all cases, where you might need the values separately,there's no excuse or reason for this. If you need comma separated values (for presentation), you get them *on-the-fly*. If this is a performance issue, you can *pre-calculate* these values (read about *triggers* and/or *persistant computed columns*). – Shnugo Jul 03 '17 at 10:25

1 Answers1

0

You can use SQL CRL to write your own STRING_AGG (appears in SQL 2016). or use existing library (for example http://groupconcat.codeplex.com/)

Mikhail Lobanov
  • 2,976
  • 9
  • 24