I think the tm package is great that it has so many functions that make NLP simpler to implement. However, I am new to this and I am running into a road block. Could someone help?
sms_clean <- tm_map(sms_corpus, content_transformer(tolower))
It gave me the following error message: Error in FUN(content(x), ...) : invalid input 'FreeMsg Hey there darling it's been 3 week's now and no word back! I'd like some fun you up for it still? Tb ok! XxX std chgs to send, �1.50 to rcv' in 'utf8towcs'
I believe there are special characters or emojis that are not UTF-8 format causing the problem. So I tried encoding when I import the file, which I found from here.
sms_raw <- read.csv('spam.csv',
encoding = 'Latin-1',
stringsAsFactors = FALSE)
It gave me this error message: Error in FUN(content(x), ...) : invalid multibyte string 1
I also tried the following that I found from another stackoverflow site:
usableText <- str_replace_all(sms_corpus,"[^[:graph:]]", " ")
and dataset <- iconv(sms_corpus, 'UTF-8', 'ASCII') either one helps.