I'm trying to create something like a correlation matrix in SQL server, looking at relationships between values in a single field. The goal is something like this (I am restricted from sharing the actual fields and values): If I had a list of purchases and products, can I output a matrix that shows how many individuals who purchased a given item also purchased another distinct item. Example: If this were my data:
[Account Item Amount
ID1 Bands $20.00
ID2 Bands $20.00
ID4 Foam Roller $40.00
ID5 Foam Roller $40.00
ID3 Shirt $30.00
ID1 Weights $100.00
ID4 Weights $100.00
ID1 Yoga Mat $25.00
ID2 Yoga Mat $25.00
ID4 Yoga Mat $25.00
ID5 Yoga Mat $25.00][1]
I'd be looking to know how many accounts who bought a Yoga Mat also bought a Foam Roller, etc. My desired output is something like this:
[Bands Foam Roller Shirt Weights Yoga Mat
Bands 2 1 1
Foam Roller 2 1 1
Shirt 1
Weights 2 2
Yoga Mat 2 2 1 4][1]
The business question is two-fold:
- How many individuals making a purchase in one product category also purchased in another category?
- What was the average spend in each category?
I've tried a pivot query but it isn't working. I'm sure I haven't structured it correctly. Does anyone have a suggestion or another thread I should look at? (I followed this one, but it isn't getting me there with only one field: Creating a correlation matrix in SQL Server)
My result set from this is just a single row with the item names across the top and totals.
edited because my rows and columns were all run together; now pasted as images. edited again due to feedback that the 'convert rows to columns' question is the same. Unfortunately, it isn't. I can get my rows into columns with values; I've done that.
What I'm really trying to do is see the crosstab or overlap of values in columns vs each other. eg How many Yoga Mat purchasers also bought a Foam Roller?