1

I have been trying to get a wordcloud up and running for a whatsapp chat. The code below is what i have been using:

setwd("E:/")
library (ggplot2)
library(tm)
library(wordcloud)
library(syuzhet)

texts <- readLines("chat.txt")

docs <- Corpus(VectorSource(texts))
docs


trans<- content_transformer(function(x, pattern) gsub(pattern = " ", x))
docs <- tm_map(docs,trans,"/")
docs <- tm_map(docs,trans,"@")
docs <- tm_map(docs,trans,"\\|")
docs <- tm_map(docs,content_transformer(tolower))
docs <- tm_map(docs, removeNumbers)
docs <- tm_map(docs,removeWords, stopwords("en"))
docs <- tm_map(docs,removePunctuation)
docs <- tm_map(docs,stripWhitespace)
docs <- tm_map(docs,stemDocument)

dtm <- TermDocumentMatrix(docs)
mat <- as.matrix(dtm)
v <- sort(rowSums(mat), decreasing = TRUE)

d<- data.frame(word= names(v), freq=v)
head(d,10)


set.seed(1056)
wordcloud(words = d$word, freq = d$freq, min.freq = 1, 
          max.words = 200, random.order = FALSE, rot.per = 0.35,
          colors = brewer.pal(8,"Dark2"))

However certain characters like "Ëœâ€", "ÂÃ", "ËœÂ" etc. are still not getting removed and distorting the wordcloud

JohnDoe
  • 11
  • 2
  • Which line exactly did you think would remove those characters? When asking for help is easier if you provide a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data so that the problem can be identified and possible solutions can be tested. – MrFlick Aug 22 '17 at 20:33

0 Answers0