2

I have a table with just product ID's and category ID's (products can be in more than one category). How can I flatten the category ID's into a product column so I end us with this:

id | name | desc | categories
1 | test1 | lorem | 1,3,4,23
2 | test2 | ipsom | 4,6,24

It is like I need to loop into a separate table for the categories column. How can I do this or is there a better way?

TruMan1
  • 33,665
  • 59
  • 184
  • 335

4 Answers4

7

I created an CLR aggregate function that takes a varchar column and returns all its values separated by commas. In other words, it joins several strings into a comma-separated list. I am sure its performance is way better than any T-Sql trick.

As any aggregate function, it can be used in combination with group by. For example:

SELECT id, name, desc, JoinStrings(CONVERT(VARCHAR(20), category_id))
FROM product p
INNER JOIN category_products c ON p.category_id = c.category_id
GROUP BY id, name, desc

Here's the C# code to create the CLR assembly into Sql Server 2008:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;


[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.UserDefined, IsInvariantToDuplicates=false, IsInvariantToOrder=false, IsInvariantToNulls=true, MaxByteSize=-1)]
public struct JoinStrings : IBinarySerialize
{
    private char[] sb;
    private int pos;
    public void Init()
    {
        sb = new char[512000];
        pos = 0;
    }

    public void Accumulate(SqlString Value)
    {
        if (Value.IsNull) return;
        char[] src = Value.ToString().ToCharArray();
        Array.Copy(src, 0, sb, pos, src.Length);
        pos += src.Length;
        sb[pos] = ',';
        pos++;
    }

    public void Merge(JoinStrings Group)
    {
        Accumulate(Group.Terminate());
    }

    public SqlString Terminate()
    {
        if (pos <= 0) 
            return new SqlString();
        else
            return new SqlString(new String(sb, 0, pos-1));
    }

    public void Read(System.IO.BinaryReader r)
    {
        this.Init();
        pos = r.ReadInt32();
        r.Read(sb, 0, pos);
    }

    public void Write(System.IO.BinaryWriter w)
    {
        w.Write(pos);
        w.Write(sb, 0, pos);
    }
}

Here's the code to create the function (although deploying from Visual Studio should do it automatically):

CREATE AGGREGATE [dbo].[JoinStrings]
(@s [nvarchar](4000))
RETURNS[nvarchar](max)
EXTERNAL NAME [YouAssemblyName].[JoinStrings]
Diego
  • 18,035
  • 5
  • 62
  • 66
  • Would be nice if it supported sorting on a column and a custom separator like in MySQL: `GROUP_CONCAT(klascode,'(',name,')' ORDER BY klascode ASC SEPARATOR ', ')` – Jan Oct 30 '18 at 21:43
1

I would suggest using a Recursive CTE. I believe that it would be something like this:

select productid, categoryid, 
    row_number() over (partition by id order by categoryid) as rownum
into #tabletorecurse
from TABLENAME

with finaloutput as
(
    select productid as id, name, desc, categoryid as categories, rownum
    from #tabletorecurse
        join PRODUCTTABLE
            on PRODUCTTABLE.id = #tabletorecurse.productid
    where rownum = 1

    union all

    select tr.id, tr.name, tr.desc, 
        finaloutput.categories + ', ' + tr.categoryid, tr.rownum
    from #tabletorecurse as tr
        join finaloutput 
            on finaloutput.rownum + 1 = tr.rownum 
                and finaloutput.id = tr.productid
)
select id, name, desc, categories
from finaloutput
    join 
    (
        select max(rownum) as maxrow, id
        from finaloutput
        group by id 
    ) as maxvalues
       on maxvalues.id = finaloutput.id 
           and maxvalues.maxrow = finaloutput.rownum
Justin Pihony
  • 66,056
  • 18
  • 147
  • 180
1

There's no in-built way to do it in MSSQL.

Simulating group_concat MySQL function in Microsoft SQL Server 2005? has a good description of how to go about implementing a workaround.

Community
  • 1
  • 1
hkf
  • 4,440
  • 1
  • 30
  • 44
1

Use a function.
This does a lookup to text so you will need to adapt.
The COALESCE is just to put a ,.
This is from a large scale production application - it works and it fast.
Function was questioned by JustinPony as function is slow
I am hitting some tables of million of records but only returning 100 rows.
The function is only applied to the hundred rows.

usage:

select top 5 sID, ( select [dbo].[JoinMVEnum](docSVsys.sID, '140') ) as [Flag Issue]
from docSVsys

function

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[JoinMVText]
(
   @sID int,
   @fieldID tinyint
)
RETURNS VARCHAR(MAX)
AS 
BEGIN
    DECLARE @MVtextList varchar(max)
    SELECT @MVtextList = COALESCE(@MVtextList + '; ', '') + docMVtext.value
    FROM docMVtext with (nolock) 
    WHERE docMVtext.sID = @sID and fieldID = @fieldID
    RETURN @MVtextList
END

GO
paparazzo
  • 44,497
  • 23
  • 105
  • 176
  • 2
    I have never heard of a SQL scalar function being fast? SQL is a SET based language, and generally stinks at one at a time procedural things. At least when you get into 100's of millions of rows (which the OP might or might not be hitting) – Justin Pihony Mar 20 '12 at 00:43
  • @JustinPihony Test it versus the approach you propose before you criticize. I am using this against 1 million rows and 6 mv columns with sub second response time. Pushing data into #temp and union are not that fast either. – paparazzo Mar 20 '12 at 13:38
  • CTE's run as set logic, though. Functions run once per row. Either way works, though, and the OP picked one :) – Justin Pihony Mar 20 '12 at 13:40
  • Yes CTE's run as set. Yes a function runs per row but that is per row of output NOT per table row. Against my data of 1.7 million your step 1 alone takes 2 seconds and my function based query runs in 0.4 seconds. You have 1 #temp, 5 selects, 3 joins, and 1 union. So a set operation is fast - that is just a lot of set operations. Retrieving all 1.7 millions rows the set based would be close. Retrieve 1000 records against a large table and functions based wins. The nature of most queries is to retrieve subset of the table. – paparazzo Mar 20 '12 at 14:45