I have been trying to do topic modeling for articles. I cleaned the raw data which contains a lot of backslash and numbers. Even after removing the punctuations, backslash, and numbers, but I got the backslash along with numbers in top terms in topic 1. The code snippet which I used for the preprocessing is
articles <- tm::tm_map(articles, content_transformer(tolower))
# Remove numbers
articles<- tm_map(articles, removeNumbers)
# Remove english common stopwords
articles<- tm_map(articles, removeWords, stopwords("english"))
# Remove punctuations
articles<- tm_map(articles, removePunctuation)
# Eliminate extra white spaces
articles <- tm_map(articles, stripWhitespace)
toSpace <- content_transformer(function(x, pattern) gsub(pattern, " ", x))
articles <- tm_map(articles,toSpace, "\\\\" )
Even after trying to clean the data, I got the backslash and numbers in top terms in topics,
design
robot
class
medical
device
wkh\003
students
dcbl
ri\003
course
The backslash and the numbers in the topics are totally inappropriate. Kindly help me with a solution