2

I am trying to use R to create a frequency table of products (by category) that are often bought together.

I have data the contains the following information:

OrderID CategoryCode4

On_line_Data2 <- read_excel("On-line Data2.xlsx", col_types =c("text","text"))

Effectively what it is is, people have bought multiple products and we want to create a frequency chart for how many times 'A' was bought with 'B'; 'A' was bought with 'C'; 'A' with 'D'; 'B' was bought with 'C'; 'B' with 'D'; 'C' with 'D' across a large list of items.

The orderID is unique for every order but can be repeated for each line as it is recorded against each product (which is only represented by a category code. Each category code can be repeated in each order so the data could be below:

   OrderID   CategoryCode4
---------------------------
    Order1    catA
    Order1    catA
    Order1    catB
    Order2    catA
    Order2    catB
    Order3    catA
    Order3    catC
    Order4    catA
    Order4    catD
    Order5    catA
    Order5    catE

The output would be something like

CatA & CatB 2
CatA & CatC 1
CatA & CatD 1
CatA & CatE 1

I don't care if the output has 'CatA & CatB = 2' as well as 'CatB & CatA = 2' or this is equal to 3 because of the 2 times of A in Order1 although this is not idea.

I am completely stuck, I'm not even sure what to Google to be able to do this. Any help would be GREATLY appreciated.

PKumar
  • 10,971
  • 6
  • 37
  • 52
keith
  • 21
  • 4

0 Answers0