0

I am trying to create a contingency matrix of products from a sales file. Let's say my orders file looks like this:

Orders      Products
order_no_1  prod_sku_1
order_no_1  prod_sku_2
order_no_1  prod_sku_3
order_no_2  prod_sku_4
order_no_3  prod_sku_2
order_no_3  prod_sku_4

I need my contigency matrix to look like this:

            prod_sku_1   prod_sku_2   prod_sku_3   prod_sku_4
prod_sku_1      1            1            1            0     
prod_sku_2      1            2            1            1
prod_sku_3      1            1            1            0
prod_sku_4      0            1            0            2

What do the numbers mean? It's the number of times a product has been sold in an order along with other products. According to the first line of the contingency matrix, prod_sku_1 was sold in one order, prod_sku_1 was sold alongside with prod_sku_2 in one order, prod_sku_3 was sold alongside with prod_sku_1 in one order, prod_sku_1 was never sold in an order where prod_sku_4 was sold. As you can see, the top right half of the triangle table is mirroring the bottom left triangle table.

What I did was to use the table function with two parameters: order_no and product_sku, giving me a contingency matrix like the following:

             prod_sku_1    prod_sku_2    prod_sku_3   prod_sku_4
order_no_1       1             1             1            0
order_no_2       0             0             0            1
order_no_3       0             1             0            1

As you can see, I need the rows to represent products, just like the columns. And the matrix values to match the number of time each product was sold at the same time with all the other products.

I could loop through each order but that wouldn't be very efficient and I can't seem to find a solution to this.

Thank you in advance

tuxilina
  • 76
  • 1

0 Answers0