I have a dataframe/tibble of 2 columns - customerID, productDescription. I want to keep a tidy dataset but find all pairs of items purchased for each customer.
My goal is to then create a tilemap displaying the frequency of purchase or each product combination. How would this be possible given a sample dataset such as below?
Please note I do not want a column separating each purchase by comma - I only want 2 products showing as pairs. These pairs will be made up of all possible combinations of products bought by a unique ID
Data:
> dput(df)
structure(list(ID = structure(c(3L, 3L, 3L, 3L, 1L, 1L, 1L, 2L,
2L), .Label = c("abc", "def", "xyz"), class = "factor"), Product = structure(c(1L,
3L, 4L, 5L, 1L, 2L, 4L, 2L, 3L), .Label = c("Product1", "Product3",
"Product4", "Product5", "Product7"), class = "factor")), .Names = c("ID",
"Product"), row.names = c(NA, -9L), class = "data.frame")
> df
ID Product
1 xyz Product1
2 xyz Product4
3 xyz Product5
4 xyz Product7
5 abc Product1
6 abc Product3
7 abc Product5
8 def Product3
9 def Product4
What I would like to see:
Data:
> df
ID Product ProductPair
1 xyz Product1 Product1
2 xyz Product1 Product4
3 xyz Product1 Product5
4 xyz Product1 Product7
5 xyz Product4 Product1
6 xyz Product4 Product4
7 xyz Product4 Product5
8 xyz Product4 Product7
9 xyz Product5 Product1
10 xyz Product5 Product4
11 xyz Product5 Product5
12 xyz Product5 Product7
13 xyz Product7 Product1
14 xyz Product7 Product4
15 xyz Product7 Product5
16 xyz Product7 Product7
Note that here we have 4 products bought by xyz
, and a combination of 2 gives us 4^2 possible combinations of products.