I'm working with unicode-formatted text in R using the text mining package tm. I'd like for unicode characters not to be destroyed when they're read into the program, but I can't find the missing keyword. Here's an example of unicode text that gets screwed up instantly upon being read as a corpus
library(tm)
u <- VectorSource("The great Chāṇakya (350–283 BC).",encoding = "UTF-8")
v <- Corpus(u)
inspect(v)
## [[1]]
## The great Chaṇakya (350–283 BC). <--The ā has been coerced to "a"
writeCorpus(v,'test.txt')
## yields: The great Cha<U+1E47>akya (350–283 BC).
I've tried using the UTF-16 as well, with the same results. Is there a way to pass this text through tm without having it destroyed?