How to create a correlation matrix or pivot based on values in a single field in SQL Server

Question

I'm trying to create something like a correlation matrix in SQL server, looking at relationships between values in a single field. The goal is something like this (I am restricted from sharing the actual fields and values): If I had a list of purchases and products, can I output a matrix that shows how many individuals who purchased a given item also purchased another distinct item. Example: If this were my data:

[Account    Item        Amount
ID1 Bands       $20.00 
ID2 Bands       $20.00 
ID4 Foam Roller $40.00 
ID5 Foam Roller $40.00 
ID3 Shirt       $30.00 
ID1 Weights     $100.00 
ID4 Weights     $100.00 
ID1 Yoga Mat    $25.00 
ID2 Yoga Mat    $25.00 
ID4 Yoga Mat    $25.00 
ID5 Yoga Mat    $25.00][1]

I'd be looking to know how many accounts who bought a Yoga Mat also bought a Foam Roller, etc. My desired output is something like this:

        [Bands  Foam Roller Shirt   Weights Yoga Mat
Bands       2               1   1
Foam Roller     2           1   1
Shirt                   1       
Weights     2                   2
Yoga Mat    2   2           1   4][1]

The business question is two-fold:

How many individuals making a purchase in one product category also purchased in another category?
What was the average spend in each category?

I've tried a pivot query but it isn't working. I'm sure I haven't structured it correctly. Does anyone have a suggestion or another thread I should look at? (I followed this one, but it isn't getting me there with only one field: Creating a correlation matrix in SQL Server)

My result set from this is just a single row with the item names across the top and totals.

edited because my rows and columns were all run together; now pasted as images. edited again due to feedback that the 'convert rows to columns' question is the same. Unfortunately, it isn't. I can get my rows into columns with values; I've done that.

What I'm really trying to do is see the crosstab or overlap of values in columns vs each other. eg How many Yoga Mat purchasers also bought a Foam Roller?

Does this answer your question? [Efficiently convert rows to columns in sql server](https://stackoverflow.com/questions/15745042/efficiently-convert-rows-to-columns-in-sql-server) — lemon, Dec 14 '22 at 22:00
It doesn't. I worked through that one before asking. Thank you though. — CPoncelow, Dec 14 '22 at 22:34
You will need to join the table to itself on `Account`, group by `First.Item` and `Second.Item`, sekect bot hof those plus a `COUNT(*)`, and then feed that to a `PIVOT`. — T N, Dec 14 '22 at 23:19

T N · Answer 1 · 2022-12-15T00:04:45.507

You will need to join the table to itself before feeding it to the PIVOT.

The pivot data value must be separate from the row and column values. In this case Account is used, but it could just as well have been 1 AS DummyValue.

SELECT PVT.*
FROM (
    SELECT First.Account, First.Item, Second.Item AS OtherItem
    FROM @Data First
    JOIN @Data Second ON Second.Account = First.Account
) D
PIVOT (
  COUNT(Account)
  FOR OtherItem IN ([Bands], [Foam Roller], [Shirt], [Weights], [Yoga Mat])
) PVT

Result:

Item	Bands	Foam Roller	Shirt	Weights	Yoga Mat
Bands	2	0	0	1	2
Foam Roller	0	2	0	1	2
Shirt	0	0	1	0	0
Weights	1	1	0	2	2
Yoga Mat	2	2	0	2	4

See this db<>fiddle for a demo.

score 0 · Accepted Answer · answered Dec 14 '22 at 23:15

Something like this perhaps:

if object_id('tempdb..#t') is not null
    drop table #t
select LTRIM(rtrim(col1)) AS id, ltrim(rtrim(col2)) AS prod, stuff(ltrim(rtrim(col3)), 1, 1, '') AS price
into #t
from 

(
    VALUES  (N'ID1', 'Bands       ','$20.00 ')
    ,   (N'ID2', 'Bands       ','$20.00 ')
    ,   (N'ID4', 'Foam Roller ','$40.00 ')
    ,   (N'ID5', 'Foam Roller ','$40.00 ')
    ,   (N'ID3', 'Shirt       ','$30.00 ')
    ,   (N'ID1', 'Weights     ','$100.00 ')
    ,   (N'ID4', 'Weights     ','$100.00 ')
    ,   (N'ID1', 'Yoga Mat    ','$25.00 ')
    ,   (N'ID2', 'Yoga Mat    ','$25.00 ')
    ,   (N'ID4', 'Yoga Mat    ','$25.00 ')
    ,   (N'ID5', 'Yoga Mat    ','$25.00')
) t (col1, col2, col3)

;with cte as (
    select distinct prod from #t
    )
, cte2 AS (
select c.prod, t2.prod AS prod2, COUNT(*) AS cnt
from CTE c
inner join #t t
    ON  t.prod = c.prod
inner join #t t2
    ON  t2.id = t.id
GROUP BY c.prod, t2.prod
)
select prod, isnull(bands, 0) AS bands, isnull([foam roller],0) AS [foam roller], isnull(shirt, 0) AS shirt, isnull(Weights, 0) AS Weights, ISNULL([Yoga mat], 0) AS [Yoga Mat]
from cte2  c
pivot (MAX(c.cnt) for prod2 IN ([Bands], [Foam Roller], [Shirt],[Weights], [Yoga Mat])) p

How to create a correlation matrix or pivot based on values in a single field in SQL Server

2 Answers2