As a new person to R, I study a lot of tutorials, currently working on word clouds.
I suffer from a common R encoding disease: utf-8 text is not displayed as expected.
I am trying to create a word cloud on a text massive in .txt file (in Ukrainian, utf-8 encoding) and my cloud is completely wrong :(.
My code, the part where I state the encoding:
text <- readLines(file.choose())
Encoding(text) <- "UTF-8"
docs <- Corpus(VectorSource(text))
inspect(docs)
The text is displayed as expected in the console (in Ukrainian, with all special symbols).
However, when I create a matrix and then a dataframe, the output has wrong encoding:
dtm <- TermDocumentMatrix(docs)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
head(d, 10)
What I see in the console:
> head(d, 10)
word freq
РЅР РЅР 1856
СЃС СЃС 1668
СЂР СЂР 1576
РЅС РЅС 1162
РІР РІР 1119
РґР РґР 1112
РјР РјР 994
РѕР РѕР 857
РєС РєС 809
РёС РёС 788
I tried to change the locale and some other stuff I found on StackOverFlow, but nothing seems to work.
What could be the problem? What am I not seeing/getting?
Thanks!