Whereas R seems to handle Unicode characters well internally, I'm not able to output a data frame in R with such UTF-8 Unicode characters. Is there any way to force this?
data.frame(c("hīersumian","ǣmettigan"))->test
write.table(test,"test.txt",row.names=F,col.names=F,quote=F,fileEncoding="UTF-8")
The output text file reads:
hiersumian <U+01E3>mettigan
I am using R version 3.0.2 in a Windows environment (Windows 7).
EDIT
It's been suggested in the answers that R is writing the file correctly in UTF-8, and that the problem lies with the software I'm using to view the file. Here's some code where I'm doing everything in R. I'm reading in a text file encoded in UTF-8, and R reads it correctly. Then R writes the file out in UTF-8 and reads it back in again, and now the correct Unicode characters are gone.
read.table("myinputfile.txt",encoding="UTF-8")->myinputfile
myinputfile[1,1]
write.table(myinputfile,"myoutputfile.txt",row.names=F,col.names=F,quote=F,fileEncoding="UTF-8")
read.table("myoutputfile.txt",encoding="UTF-8")->myoutputfile
myoutputfile[1,1]
Console output:
> read.table("myinputfile.txt",encoding="UTF-8")->myinputfile
> myinputfile[1,1]
[1] hīersumian
Levels: hīersumian ǣmettigan
> write.table(myinputfile,"myoutputfile.txt",row.names=F,col.names=F,quote=F,fileEncoding="UTF-8")
> read.table("myoutputfile.txt",encoding="UTF-8")->myoutputfile
> myoutputfile[1,1]
[1] <U+FEFF>hiersumian
Levels: <U+01E3>mettigan <U+FEFF>hiersumian
>