Comma separated string total count by variable by row

Question

This is the schema:

User_ID Page_ID Timestamp
1   48,51,94    7/26/2017 8:30
2   42,11,84    7/26/2017 9:40
3   4,16,24 7/26/2017 16:20
4   7,2,94  7/27/2017 8:00
1   48,22,94    7/27/2017 13:50
2   42,11   7/27/2017 14:00
3   4,24    7/27/2017 18:15

The code below gives aggregate count of page ids ran per user (non-unique on purpose):

SELECT User_ID, sum(len(Page_ID) - len(replace(Page_ID, ',', '')) +1) as TotalPageCount
FROM DBTABLE
group by User_ID

Output:

User_ID TotalPageCount
1   6
2   5
3   5
4   3

However, I am looking to add a (comma separated) column with page count per page id per user id. ie. a column as newsletter id 1: count, newsletter id 2: count, etc. (essentially a dictionary). Can be a different format, but needs to be descriptive at the page id level, with its respective count.

Something like this:

User_ID PageIDCount TotalPageCount
1   48:2, 51:1, 94:2, 22:1, 6
2   42:2, 11:2, 84:1, 5
3   4:2, 16:1, 24:2, 5
4   7:1, 2:1, 94:1, 3

Your help is greatly appreciated!

Edit:

As per SeanLange's amazing solution, you can change the definition to MyCTE to the below, in order to avoid using any functions:

select user_id, page_id, page_count = count(*) 
FROM (
SELECT user_id, Split.a.value('.', 'NVARCHAR(max)') AS page_id FROM
    ( SELECT user_id, CAST ('<M>' + REPLACE(page_id, ',', '</M><M>') + '</M>' AS XML) page_id 
    FROM #temp
    ) AS A 
CROSS APPLY page_id.nodes ('/M') AS Split(a)
) x
group by user_id, page_id

I am assuming we can use some type of a window function here, I just wasnt able to figure out which or how — edost4, Sep 11 '17 at 16:32
I would urge you to read your question and ask yourself if you think you would be able to answer this. There just isn't any information to go on here. This would be a great place to start. http://spaghettidba.com/2015/04/24/how-to-post-a-t-sql-question-on-a-public-forum/ — Sean Lange, Sep 11 '17 at 17:02
Is there a reason you are storing your data like this? It violates 1NF and is going to cause you unbelievable amounts of pain. — Sean Lange, Sep 11 '17 at 18:51
Totaly agree with Sean. Can you at least use XML and remove it for visualization? I started a solution, but I quit after a few minutes of frustration. Still tuned hoping to see some magic trick — Horaciux, Sep 11 '17 at 18:54
Short answer is that the data is taken from a source in json format, which I ingest into a SQL Server table and then stitch with variety of other data for that same user ID. I could parse it when ingesting to have each value in a separate line if that is more helpful. — edost4, Sep 11 '17 at 19:05
@lostmylogin 1NF is first normal form. https://en.wikipedia.org/wiki/First_normal_form It is the basics of data normalization. — Sean Lange, Sep 11 '17 at 19:06
@Sean Lange SELECT @@version Microsoft SQL Server 2014 (SP2-CU3) (KB3204388) - 12.0.5538.0 (X64) Dec 15 2016 17:52:48 Copyright (c) Microsoft Corporation Enterprise Edition (64-bit) on Windows NT 6.3 (Build 9600: ) — edost4, Sep 11 '17 at 19:16

Sean Lange · Accepted Answer · 2017-09-12T18:37:44.497

Wow this is a nightmare. You are going to need a string splitter to start with. My personal favorite is this one. http://www.sqlservercentral.com/articles/Tally+Table/72993/ There are a number of other excellent choices here. https://sqlperformance.com/2012/07/t-sql-queries/split-strings

Starting with your data you will need to do something like this.

declare @Something table
(
    User_ID int
    , Page_ID varchar(100)
    , MyDate datetime
)

insert @Something
select 1, '48,51,94', '7/26/2017 8:30' union all
select 2, '42,11,84', '7/26/2017 9:40' union all
select 3, '4,16,24', '7/26/2017 16:20' union all
select 4, '7,2,94', '7/27/2017 8:00' union all
select 1, '48,22,94', '7/27/2017 13:50' union all
select 2, '42,11', '7/27/2017 14:00' union all
select 3, '4,24', '7/27/2017 18:15'

select User_ID
    , Page_ID = x.Item
    , count(*)
from @Something s
cross apply dbo.DelimitedSplit8K(s.Page_ID, ',') x
group by User_ID
    , x.Item
order by User_ID
    , x.Item

This gets the data with the counts you want. From there you are going to have to shove this back into the denormalized structure that you want. You can do this with FOR XML. Here is an article that explains how to do that part of this. Simulating group_concat MySQL function in Microsoft SQL Server 2005?

-----EDIT-----

OK here is the complete working solution. You have obviously been working hard at trying to get this sorted out. I am using the DelimitedSplit8K function here so I didn't have to inline XML like your solution was doing.

with MyCTE as
(
    select User_ID
        , Page_ID = x.Item
        , PageCount = count(*)
    from @Something s
    cross apply dbo.DelimitedSplit8K(s.Page_ID, ',') x
    group by User_ID
        , x.Item
)
, GroupedPageViews as
(       
    select c.User_ID
        , sum(c.PageCount) as TotalPageCount
        , PageViews = STUFF((select ', ' + convert(varchar(4), c2.Page_ID) + ':' + convert(varchar(4), c2.PageCount)
        from MyCTE c2
        where c.User_ID = c2.User_ID
        order by c2.Page_ID
        for xml path('')), 1, 1, '')
    from MyCTE c
    group by c.User_ID
)

select gpv.User_ID
    , gpv.PageViews
    , gpv.TotalPageCount
from GroupedPageViews gpv
join MyCTE c on c.User_ID = gpv.User_ID
group by gpv.PageViews
    , gpv.User_ID
    , gpv.TotalPageCount
order by gpv.User_ID

This will return your data like this.

User_ID PageViews               TotalPageCount
1       22:1, 48:2, 51:1, 94:2  6
2       11:2, 42:2, 84:1        5
3       16:1, 24:2, 4:2         5
4       2:1, 7:1, 94:1          3

I dont have the proper permissions to "create function", which seems to be necessary for using what you suggest. Is there a way to create temp function similar to temp table? Also, I am not sure how to "shove this back".. — edost4, Sep 12 '17 at 13:25
You have delimited data which is horrific to work with so you are going to need a way to split it. The function I posted is a straight forward way to do that. Look at the second link I posted, you could leverage the XML function without it being a function. And the "shove it back together" is taking the parsed data and forcing it back into your denormalized structure. The link about group_concat explains how to do that. The reason you are struggling with this task is because the data structure is so poor. If the data structures weren't like this you wouldn't have these struggles. — Sean Lange, Sep 12 '17 at 13:28
Unfortunately code in comments does not work well. You will have to split this data first so you can get your count. Then you need to reassemble the delimited values using FOR XML. I gave you the code for the hard part, splitting this and getting counts. From there I sent you a link to an example of how to turn rows into a delimited string. That part should be pretty simple. — Sean Lange, Sep 12 '17 at 16:42
I split it successfully and concatenated it back as well, but the output is not what I was looking for, as seen in the fiddle (you can see the splitting part in the subquery): http://sqlfiddle.com/#!6/55cc4/7 Perhaps my request was too hard. Having Page ID column be just the unique page ids associated to the user would work as well — edost4, Sep 12 '17 at 18:24

score 0 · Answer 2 · answered Sep 11 '17 at 19:03

Here you go

SELECT DISTINCT User_Id
    , (
        SELECT CAST(t.Value AS VARCHAR) + ':' + CAST(COUNT(t.value) AS VARCHAR) + ', '
        FROM TBL_46160346_DBTABLE ii
        CROSS APPLY (
            SELECT *
            FROM fn_ParseText2Table(Page_ID, ',')
            ) t
        WHERE pp.User_Id = ii.User_Id
        GROUP BY User_Id
            , VALUE
        ORDER BY User_Id
        FOR XML PATH('')
        ) PageIDCount
    , (
        SELECT  COUNT(*)
        FROM TBL_46160346_DBTABLE ii
        CROSS APPLY (
            SELECT *
            FROM fn_ParseText2Table(Page_ID, ',')
            ) t
        WHERE pp.User_Id = ii.User_Id
        GROUP BY User_Id
        ) TotalPageCount
FROM TBL_46160346_DBTABLE pp

fn_ParseText2Table function

ALTER FUNCTION [dbo].[fn_ParseText2Table] (
    @p_SourceText VARCHAR(8000), @p_Delimeter VARCHAR(10) = ',' --default comma
    )
RETURNS @retTable TABLE (Value BIGINT)
AS
BEGIN
    DECLARE @w_Continue INT, @w_StartPos INT, @w_Length INT, @w_Delimeter_pos INT, @w_tmp_txt VARCHAR(48), @w_Delimeter_Len TINYINT

    IF LEN(@p_SourceText) = 0
    BEGIN
        SET @w_Continue = 0 -- force early exit
    END
    ELSE
    BEGIN
        -- parse the original @p_SourceText array into a temp table
        SET @w_Continue = 1
        SET @w_StartPos = 1
        SET @p_SourceText = RTRIM(LTRIM(@p_SourceText))
        SET @w_Length = DATALENGTH(RTRIM(LTRIM(@p_SourceText)))
        SET @w_Delimeter_Len = LEN(@p_Delimeter)
    END

    WHILE @w_Continue = 1
    BEGIN
        SET @w_Delimeter_pos = CHARINDEX(@p_Delimeter, SUBSTRING(@p_SourceText, @w_StartPos, @w_Length - @w_StartPos + @w_Delimeter_Len))

        IF @w_Delimeter_pos > 0 -- delimeter(s) found, get the value
        BEGIN
            SET @w_tmp_txt = LTRIM(RTRIM(SUBSTRING(@p_SourceText, @w_StartPos, @w_Delimeter_pos - 1)))
            SET @w_StartPos = @w_Delimeter_pos + @w_StartPos + @w_Delimeter_Len - 1
        END
        ELSE -- No more delimeters, get last value
        BEGIN
            SET @w_tmp_txt = LTRIM(RTRIM(SUBSTRING(@p_SourceText, @w_StartPos, @w_Length - @w_StartPos + @w_Delimeter_Len)))

            SELECT @w_Continue = 0
        END

        INSERT INTO @retTable
        VALUES (@w_tmp_txt)
    END

    RETURN
END

I beg you not to use that string splitter. It is horrible for performance. You have a loop inside of a multi statement table valued function. See the links in my answer for much better options for a string splitter. — Sean Lange, Sep 11 '17 at 19:05

Comma separated string total count by variable by row

2 Answers2

fn_ParseText2Table function