I am trying to create a contingency matrix of products from a sales file. Let's say my orders file looks like this:
Orders Products
order_no_1 prod_sku_1
order_no_1 prod_sku_2
order_no_1 prod_sku_3
order_no_2 prod_sku_4
order_no_3 prod_sku_2
order_no_3 prod_sku_4
I need my contigency matrix to look like this:
prod_sku_1 prod_sku_2 prod_sku_3 prod_sku_4
prod_sku_1 1 1 1 0
prod_sku_2 1 2 1 1
prod_sku_3 1 1 1 0
prod_sku_4 0 1 0 2
What do the numbers mean? It's the number of times a product has been sold in an order along with other products.
According to the first line of the contingency matrix, prod_sku_1
was sold in one order, prod_sku_1
was sold alongside with prod_sku_2
in one order, prod_sku_3
was sold alongside with prod_sku_1
in one order, prod_sku_1
was never sold in an order where prod_sku_4
was sold.
As you can see, the top right half of the triangle table is mirroring the bottom left triangle table.
What I did was to use the table
function with two parameters: order_no and product_sku, giving me a contingency matrix like the following:
prod_sku_1 prod_sku_2 prod_sku_3 prod_sku_4
order_no_1 1 1 1 0
order_no_2 0 0 0 1
order_no_3 0 1 0 1
As you can see, I need the rows to represent products, just like the columns. And the matrix values to match the number of time each product was sold at the same time with all the other products.
I could loop through each order but that wouldn't be very efficient and I can't seem to find a solution to this.
Thank you in advance