I am using RTextTools to build a training set with a matrix and a model which I will later apply to different documents to classify them.
EDIT: The matrix is a Document Term Matrix
The problem I am having is that sometimes with certain documents when I create the new_matrix
with the following line
new_matrix <- create_matrix(data$document,language="english", removeNumbers=FALSE, removePunctuation=TRUE, removeStopwords=TRUE, toLower=TRUE, stemWords=TRUE, minDocFreq=1,weighting=weightTfIdf,originalMatrix=matrix)
I get some NaN
values which make my corpus fail
corpus <- create_corpus(new_matrix,data$value, testSize=1:100,virgin=FALSE)
With the error
Error in .csr.coo(x) : NA/NaN/Inf in foreign function call (arg 4)
I am not sure why there are some NaN
values. My guess is that it has to do with some words being present on the new_matrix and not on the original matrix.
How can I change NaN
values for a 0 in the resulting matrix?
Will doing that alter the result of the classification?
Any help much appreciated! Thanks!