I am trying to use R to create a frequency table of products (by category) that are often bought together.
I have data the contains the following information:
OrderID CategoryCode4
On_line_Data2 <- read_excel("On-line Data2.xlsx", col_types =c("text","text"))
Effectively what it is is, people have bought multiple products and we want to create a frequency chart for how many times 'A' was bought with 'B'; 'A' was bought with 'C'; 'A' with 'D'; 'B' was bought with 'C'; 'B' with 'D'; 'C' with 'D' across a large list of items.
The orderID is unique for every order but can be repeated for each line as it is recorded against each product (which is only represented by a category code. Each category code can be repeated in each order so the data could be below:
OrderID CategoryCode4
---------------------------
Order1 catA
Order1 catA
Order1 catB
Order2 catA
Order2 catB
Order3 catA
Order3 catC
Order4 catA
Order4 catD
Order5 catA
Order5 catE
The output would be something like
CatA & CatB 2
CatA & CatC 1
CatA & CatD 1
CatA & CatE 1
I don't care if the output has 'CatA & CatB = 2
' as well as 'CatB & CatA = 2
' or this is equal to 3 because of the 2 times of A in Order1 although this is not idea.
I am completely stuck, I'm not even sure what to Google to be able to do this. Any help would be GREATLY appreciated.