I have a data set of users and products
user product
a X
a Y
a Z
b Z
c Y
c Z
...
from which I'd like a list of all pairwise combinations of products and the frequency of users that have that particular combination, e.g.
product_1 product_2 N
X Y 1 <- user a
X Z 1 <- user a
Y Z 2 <- users a,c
...
So far, I've successfully made a list of all product combinations
data = data.frame(users=c('a','a','a','b','c','c'), products = c("X","Y","Z","Z","Y","Z"))
combinations = t(combn(unique(data$products), 2))
but when I implemented an ugly for
-loop to iterate through the combinations and fetch info from the original data set it just wouldn't ever finish (actual data is about 1k products and 500k users, 25m rows in total).
What strategy would you use to implement this?
EDIT
Added a column of data to make things clearer. The result should count the number of users that have both product_1 and product_2 (even if they have other products as well).