9

I have a simple matrix, e.g.

test <- matrix(c("u1","p1","u1","p2","u2","p2","u2",
                 "p3","u3","p1","u4","p2","u5","p1",
                 "u5","p3","u6","p3","u7","p4","u7",
                 "p3","u8","p1","u9","p4"),
               ncol=2,byrow=TRUE) 
colnames(test) <- c("user","product")
test1<-as.data.frame(test)

test:

   user   product
1  u1      p1
2  u1      p2
3  u2      p2 
4  u2      p3
5  u3      p1
6  u4      p2
7  u5      p1
8  u5      p3
9  u6      p3
10 u7      p4
11 u7      p3
12 u8      p1
13 u9      p4

I want to count how many users bought product pair together, such as p1&p2, p2& p3...

table(test1$product,test1$product) give me this :

     p1   p2  p3  p4
 p1   4   0   0   0
 p2   0   3   0   0
 p3   0   0   4   0
 p4   0   0   0   2

How can I get the right result as :

     p1   p2  p3  p4
 p1   4   1   1   0
 p2   1   3   1   0
 p3   1   1   4   1
 p4   0   0   1   2
Spacedman
  • 92,590
  • 12
  • 140
  • 224
linus
  • 331
  • 2
  • 9

3 Answers3

14

Looking at your desired output, you are looking for the crossprod function:

crossprod(table(test1))
#        product
# product p1 p2 p3 p4
#      p1  4  1  1  0
#      p2  1  3  1  0
#      p3  1  1  4  1
#      p4  0  0  1  2

This is the same as crossprod(table(test1$user, test1$product)) (reflecting Dennis's comment).

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
4

A similar question tagged to this post requested an efficient solution, but got deleted undeleted now. We decided to post the solution here.

Here is one with RcppEigen to do the crossproduct

library(RcppEigen)
library(inline)
prodFun <- '
        typedef Eigen::Map<Eigen::MatrixXi> MapMti;
        const MapMti B(as<MapMti>(BB));
        const MapMti C(as<MapMti>(CC));
        return List::create(B.adjoint() * C);
        '

funCPr <- cxxfunction(signature(BB= "matrix", CC = "matrix"),
                     prodFun, plugin = "RcppEigen") 
tbl <- table(test1)
funCPr(tbl, tbl)[[1]]
#     [,1] [,2] [,3] [,4]
#[1,]    4    1    1    0
#[2,]    1    3    1    0
#[3,]    1    1    4    1
#[4,]    0    0    1    2

Benchmarks

set.seed(24)
test2 <- data.frame(user = sample(1:5000, 1e6, replace=TRUE),
    product = sample(paste0("p", 1:50), 1e6, replace = TRUE),
    stringsAsFactors=FALSE)
tbl1 <- table(test2)

library(microbenchmark)
microbenchmark(cPP = funCPr(tbl1, tbl1)[[1]], 
              CrossP = crossprod(tbl1),
              adjMat = adjmat(tbl1)$adjacency,
              unit = "relative", times = 10L)
#Unit: relative
#   expr      min       lq     mean   median       uq       max neval cld
#    cPP 1.000000 1.000000 1.000000 1.000000 1.000000  1.000000    10  a 
# CrossP 2.079867 2.070509 2.234376 2.074388 2.290516  2.676798    10  a 
# adjMat 6.223034 6.500791 9.619088 7.197824 7.771270 31.394812    10   b

NOTE: This could be made more efficient by doing the table in Rcpp

Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662
3

Ananda's solution is superior (it is lighter weight and requires no external package) but I am putting down another. I believe this is called an adjacency matrix (smarter people feel free to edit this if I'm wrong):

library(qdap)
adjmat(table(test1))$adjacency

##        product
## product p1 p2 p3 p4
##      p1  4  1  1  0
##      p2  1  3  1  0
##      p3  1  1  4  1
##      p4  0  0  1  2
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519