0

I have a table that has customer id and product id combination. I want to identify product holding combinations that have the greatest yield. i.e. what are the product combinations and product sub combinations or individual products that have the greatest yield.

For example, if I have have a combination of 3 products, I want to rank it based on other combinations or individual occurrences of these products. Trying to do this so that I can identify the combinations that would have the greatest yield.

Please share your thoughts on this. DB - SQL Server.

CREATE TABLE [dbo].[spk_bkup_cust_prod](
    [cust_id] [varchar](100) NULL,
    [prod_id] [varchar](100) NULL
) ON [PRIMARY]

GO
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust1', N'prod1')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust1', N'prod2')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust2', N'prod1')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust3', N'prod2')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust3', N'prod3')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust4', N'prod1')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust5', N'prod1')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust5', N'prod2')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust5', N'prod3')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust6', N'prod1')

prodset             prodrnk_max
prod1               3
prod1-prod2         4
prod1-prod2-prod3   6
prod2-prod3         1

In the above example, if I study prod1 and prod2, I would have hit 4 customers. I would have hit one customer that has prod1-prod2 and 3 customers that have only prod1.

If I study prod1, prod2 and prod3, then I would have hit all customers.

Tried to use a combination of string_split and XML PATH. It helped integrate single product sub holdings but not multi product sub holdings.


with custprod as (
    SELECT
        distinct cust_id , prodset = STUFF((
              SELECT '-' + cp_grp.prod_id
              FROM dbo.spk_bkup_cust_prod cp_grp
              WHERE cp.cust_id = cp_grp.cust_id
              FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 1, '')
    FROM dbo.spk_bkup_cust_prod cp
), prodcnt as (
    select prodset, count(*) rcnt from custprod group by prodset
), proddis as (
    select 
        prodcnt.prodset
        , prodcnt.rcnt
        , value indivprod
        --, convert(varchar,value)
    from 
    prodcnt
    cross apply string_split(prodset,'-')
), prodrnk as (
    select proddis.*
    , case when proddis.indivprod != proddis.prodset and prodcnt.prodset is not null then  proddis.rcnt + prodcnt.rcnt  else proddis.rcnt end prodrnk 
    from proddis 
    left join prodcnt on prodcnt.prodset = proddis.indivprod
)
select
    prodset, max(prodrnk) prodrnk_max
from 
prodrnk
group by prodset  

I thought this was an wrong approach to begin with so did not attach originally.

Prak_Rum
  • 25
  • 1
  • 11
  • Sorry. I tried to use combination of XML PATH and string_split. I was able to link in holdings and single product holdings. Couldn't link in multi product holdings. Will update the comment. – Prak_Rum Aug 05 '20 at 06:26
  • What is the logic for product combination? – mkRabbani Aug 05 '20 at 06:52
  • product combination is just different combinations of products that a customer holds. In the example above, prod1-prod2 is a combination that the customer1 holds. – Prak_Rum Aug 05 '20 at 06:54
  • Do your expected output (given) is sync with data you have provided (insert script)? – mkRabbani Aug 05 '20 at 06:59
  • Yes mate. The expected result and the data are consistent. Which part do you think is incorrect? I'll try to explain better. – Prak_Rum Aug 05 '20 at 07:02
  • How you get "prod1-prod2 > 4"? I get them twice in your data for customer 1 and 5. Another question - 1 customer have how many product? is there any limit or unlimited? – mkRabbani Aug 05 '20 at 07:13
  • prod1-prod2 is 4 because there is one customer with prod1-prod2 prod combination and 3 with prod1, making it 4. A customer can have any number of products. – Prak_Rum Aug 05 '20 at 09:46

2 Answers2

1

If I understand correctly, it would be like this.

with custprod as (
    SELECT
        distinct cust_id , prodset = STUFF((
              SELECT '-' + cp_grp.prod_id
              FROM dbo.spk_bkup_cust_prod cp_grp
              WHERE cp.cust_id = cp_grp.cust_id
              FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 1, '')
    FROM dbo.spk_bkup_cust_prod cp
), prodcnt as (
    select
        cust_id,
        prod_id,
        count(*) over(partition by cust_id) as cnt -- number of products each customer has
    from dbo.spk_bkup_cust_prod
), custcnt as (
    select distinct
        p1.cust_id,
        count(*) over(partition by p1.cust_id) as cnt -- number of customers contains the same projects
    from prodcnt p1
    inner join prodcnt p2 on p1.prod_id = p2.prod_id
    group by p1.cust_id, p2.cust_id
    having max(p2.cnt) = count(*)
)

select
    prodset,
    max(c.cnt) as prodrnk_max
from custprod p 
inner join custcnt c on c.cust_id = p.cust_id
group by prodset
manabu
  • 471
  • 1
  • 5
  • 9
  • this gets broken when we have a gap in the holdings. like, if a customer hold prod1 and prod3, this does not seem to work. – Prak_Rum Aug 05 '20 at 09:52
0

This seems to work for me, atleast for the small sample set.


drop table if exists dbo.#CustProdTag
SELECT distinct top 100 percent
    cust_id,
    prodset =   STUFF(
                    (
                        SELECT '%' + cp_grp.prod_id
                        FROM dbo.spk_bkup_cust_prod cp_grp
                        WHERE cp.cust_id = cp_grp.cust_id
                        FOR XML PATH(''), TYPE
                    --  order by cp_grp.prod_id
                        ).value('.', 'NVARCHAR(MAX)'), 1, 1, ''
                        
                )
INTO    dbo.#CustProdTag
FROM    dbo.spk_bkup_cust_prod cp
ORDER BY prodset

------------------------------------------------------------------------

drop table if exists dbo.#ProdCombo
select distinct prodset
into dbo.#ProdCombo
from dbo..#CustProdTag

------------------------------------------------------------------------

SELECT * FROM dbo.#ProdCombo
SELECT * FROM dbo.#CustProdTag  ORDER BY prodset

------------------------------------------------------------------------

SELECT
    pc.prodset,
    COUNT(*)        [RecCnt]
FROM
    dbo.#ProdCombo                  pc
    LEFT JOIN   dbo.#CustProdTag    cpt     ON  pc.prodset like '%' + cpt.prodset + '%'
GROUP BY
    pc.prodset
ORDER BY
    pc.prodset


Prak_Rum
  • 25
  • 1
  • 11