I have a table that has customer id and product id combination. I want to identify product holding combinations that have the greatest yield. i.e. what are the product combinations and product sub combinations or individual products that have the greatest yield.
For example, if I have have a combination of 3 products, I want to rank it based on other combinations or individual occurrences of these products. Trying to do this so that I can identify the combinations that would have the greatest yield.
Please share your thoughts on this. DB - SQL Server.
CREATE TABLE [dbo].[spk_bkup_cust_prod](
[cust_id] [varchar](100) NULL,
[prod_id] [varchar](100) NULL
) ON [PRIMARY]
GO
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust1', N'prod1')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust1', N'prod2')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust2', N'prod1')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust3', N'prod2')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust3', N'prod3')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust4', N'prod1')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust5', N'prod1')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust5', N'prod2')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust5', N'prod3')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust6', N'prod1')
prodset prodrnk_max
prod1 3
prod1-prod2 4
prod1-prod2-prod3 6
prod2-prod3 1
In the above example, if I study prod1 and prod2, I would have hit 4 customers. I would have hit one customer that has prod1-prod2 and 3 customers that have only prod1.
If I study prod1, prod2 and prod3, then I would have hit all customers.
Tried to use a combination of string_split and XML PATH. It helped integrate single product sub holdings but not multi product sub holdings.
with custprod as (
SELECT
distinct cust_id , prodset = STUFF((
SELECT '-' + cp_grp.prod_id
FROM dbo.spk_bkup_cust_prod cp_grp
WHERE cp.cust_id = cp_grp.cust_id
FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 1, '')
FROM dbo.spk_bkup_cust_prod cp
), prodcnt as (
select prodset, count(*) rcnt from custprod group by prodset
), proddis as (
select
prodcnt.prodset
, prodcnt.rcnt
, value indivprod
--, convert(varchar,value)
from
prodcnt
cross apply string_split(prodset,'-')
), prodrnk as (
select proddis.*
, case when proddis.indivprod != proddis.prodset and prodcnt.prodset is not null then proddis.rcnt + prodcnt.rcnt else proddis.rcnt end prodrnk
from proddis
left join prodcnt on prodcnt.prodset = proddis.indivprod
)
select
prodset, max(prodrnk) prodrnk_max
from
prodrnk
group by prodset
I thought this was an wrong approach to begin with so did not attach originally.