0

I am trying to create a matrix mulptiplication with sparse matrix and with the package called quanteda, utilising data.table package, related to this thread here. So

require(quanteda) 

mytext <- c("Let the big dogs hunt", "No holds barred", "My child is an honor student")     
myMatrix <-dfm(mytext, ignoredFeatures = stopwords("english"), stem = TRUE) #a data.table
as.matrix(myMatrix) %*% transpose(as.matrix(myMatrix))

how can you get the matrix multiplication working here with quanteda package and sparse matrices?

Community
  • 1
  • 1
hhh
  • 50,788
  • 62
  • 179
  • 282

2 Answers2

1

Use t command, not transpose command, for the matrix multiplication such that

as.matrix(myMatrix) %*% t(as.matrix(myMatrix))

also as commented, as.matrix is non-sparse while Matrix::matrix is sparse but unnecessary here, so better

myMatrix %*% t(myMatrix)

and potentially even better

crossprod(myMatrix) 
tcrossprod(myMatrix) 

but it requires numeric/complex matrix/vector arguments, not working with the example in the question:

require(quanteda)  
mytext <- c("Let the big dogs hunt", "No holds barred", "My child is an honor student")      
myMatrix <-dfm(mytext, ignoredFeatures = stopwords("english"), stem = TRUE) 
crossprod(myMatrix) 
tcrossprod(myMatrix)
hhh
  • 50,788
  • 62
  • 179
  • 282
  • 2
    Also, `as.matrix` will not create a sparse matrix. Use `Matrix::Matrix` instead. – dww Jan 09 '17 at 15:53
  • @dww super important point, thank you! +1 How is `Matrix::Matrix` different from `Matrix::sparseMatrix`? – hhh Jan 09 '17 at 16:04
  • I'm not familar wth quantega, but I just installed it, and it seems that 'dfm' already returns a sparse matrix of class `dfm-class`. In which case all you need is `myMatrix %*% t(myMatrix)`. Are you using an old version of quantega that you get a data.table returned? Also in my version `ignoredFeatures` argument is ignored. – dww Jan 09 '17 at 17:15
  • 2
    I always thought this is what we have `tcrossprod` for – David Arenburg Jan 09 '17 at 20:53
  • @DavidArenburg can you clarify? crossprod/tcrossprod fires error about requiring numeric/complex matrix/vector arguments. – hhh Jan 10 '17 at 06:41
1

This works just fine:

mytext <- c("Let the big dogs hunt", 
            "No holds barred", 
            "My child is an honor student")     
myMatrix <- dfm(mytext)

myMatrix %*% t(myMatrix)
## 3 x 3 sparse Matrix of class "dgCMatrix"
##       text1 text2 text3
## text1     5     .     .
## text2     .     3     .
## text3     .     .     6

No need to coerce to a dense matrix using as.matrix(). Note that it is no longer a "dfmSparse" object because it's no longer a matrix of documents by features.

Ken Benoit
  • 14,454
  • 27
  • 50