1

I have term document matrix before and want to add new document to that term document matrix, in another way it can say to join two document matrix.

My term document matrix is :

     Docs
Term   1
eat    7
food   2
run    2
sick   3

Then another document is watch football match and eat food

After the process, i want my term document matrix to be :

         Docs
Term     1   2
eat      7   1
food     2   1
run      2   0
sick     3   0
watch    0   1
football 0   1
match    0   1
and      0   1

I've tried this :

library("SnowballC")
library("NLP")
library("tm")
library("lsa")

                   #mytermdm (term document matrix i have before)

text2 <- "watch fottball match and eat food"
myCorpus <- Corpus(VectorSource(text2))

tdm2 <- TermDocumentMatrix(myCorpus, control = list
                         (removeNumbers = TRUE, 
                         removePunctuation = TRUE, 
                         stopwords=stopwords_en, 
                         stemming=TRUE)
)
mytdm3 <- c(mytermdm,tdm2)
inspect(mytdm3)

I get this :

TermDocumentMatrix (terms: 7, document:2)

Error in `[.simple_triplet_matrix`(x,terms,doc)`
    Repeated indices currently no allowed.
Hilfit19
  • 29
  • 7
  • Possible duplicate of https://stackoverflow.com/questions/47410866/r-inspect-document-term-matrix-results-in-error-repeated-indices-currently-not – akrun Apr 17 '18 at 04:44
  • still confused, in that question, we know the text file, then make tdm based on the text, in my question, I just know the text of the second file you want to put in tdm. Simply put, I loaded tdm, then made tdm which is a combination of previous tdm with the second text – Hilfit19 Apr 17 '18 at 12:18
  • 1
    I have solved it, before combine two document matrix, I replace docs names in tdm2 using : `colnames(tdm2) <- as.numeric(max(colnames(mytermdm)))+1` and then combine it – Hilfit19 Apr 17 '18 at 13:12
  • @Hilfit19, that is one of the solutions. you can also adjust the dimnames in mytdm3 or even in the beginning with corpus and meta functions. If you want I can write a big answer that touches all of the options. – phiver Apr 17 '18 at 13:20
  • No @phiver thank for your kindness – Hilfit19 Apr 17 '18 at 13:35
  • You can post that as a solution – akrun Apr 17 '18 at 14:24

1 Answers1

0

I have solved it, before combine two term document matrix, I replace docs names in tdm2. So, the full algoritm :

library("SnowballC")
library("NLP")
library("tm")
library("lsa")

#mytermdm (term document matrix i have before)

text2 <- "watch fottball match and eat food"
myCorpus <- Corpus(VectorSource(text2))

tdm2 <- TermDocumentMatrix(myCorpus, control = list
                     (removeNumbers = TRUE, 
                     removePunctuation = TRUE, 
                     stopwords=stopwords_en, 
                     stemming=TRUE)
)

colnames(tdm2) <- as.numeric(max(colnames(mytermdm)))+1     #my add solution 


mytdm3 <- c(mytermdm,tdm2)
inspect(mytdm3)
Hilfit19
  • 29
  • 7