I am in dire need. I have a corpus that I have converted into a common language, but some of the words were not properly converted into English. Therefore, my corpus has non-ASCII characters such as U+00F8
.
I am using Quanteda and I have imported my text using this code:
EUCorpus <- corpus(textfile(file="/Users/RiohBurke/Documents/RStudio/PROJECT/*.txt"), encodingFrom = "UTF-8-BOM")
My corpus consists of 166 documents. Having imported the documents into R, what would be the best way to get rid of these non-ASCII characters?