1
dtm <- DocumentTermMatrix(reuters, control=list(wordLengths=c(1,Inf)))

I am thinking of turning dtm into a term-term matrix, what's below is incorrect:

dtm <- dtm %*% t(dtm)

How might it be done?

joran
  • 169,992
  • 32
  • 429
  • 468
YangJ
  • 53
  • 2
  • 6

3 Answers3

2

If I understand the structure of a document-term matrix correctly, it is t(dtm) %*% dtm. See this answer.

Community
  • 1
  • 1
mhermans
  • 2,097
  • 4
  • 18
  • 31
0

I believe an approach as follows would work (note you are creating Boolean or maybe and adjacency matrix):

t(as.matrix(dtm)) %*% as.matrix(dtm)

For big dtm you will bounce into R's limits using as.matrix. The Matrix package can help. Note I switch i and j to do the transpose in the first matrix.

data("acq")
dtm <- DocumentTermMatrix(acq, control=list(wordLengths=c(1,Inf)))
tdm <- t(dtm)

library(Matrix)
Xt <- sparseMatrix(j=dtm$i, i=dtm$j, x=dtm$v)
X <- sparseMatrix(j=tdm$i, i=tdm$j, x=tdm$v)

Xt %*% X

# For easier viewing
(Xt %*% X) [1:20, 1:20]
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
0
TDM <- TermDocumentMatrix(x) # Form a Term document matrix

termDocMatrix <- as.matrix(TDM) # convert your TDM into a matrix

termDocMatrix[termDocMatrix>=1] <- 1    # change the TDM into Boolean matrix

# term adjacency matrix
termMatrix <- termDocMatrix %*% t(termDocMatrix)


termMatrix[1:10,1:10]  # inspect terms numbered 1 to 10
lmo
  • 37,904
  • 9
  • 56
  • 69
  • Hi, welcome to stackoverflow. Please describe the answers more. when you have a link in your answer, it is probable that the page get removed and your answer get useless for other people in the future – Ashkan S Aug 27 '16 at 17:38