Please can someone suggest the best way to import data with Vietnamese characters into an R dataframe, so that the data are depicted correctly. The kind of data I need to import includes a longer version of the column below:
Student_name
PHẠM THANH
PHẠM VĂN
NGUYỄN TUẤN
NGUYỄN VĂN
VŨ NGỌC
I tried many options including saving the data as Unicode.txt and importing into R with encoding = UTF-8
specified.
With read.csv
or read.table
, I get the error message
In
read.table("Stu.txt", header = TRUE, encoding = "UTF-8")
: line 1 appears to contain embedded nulls
Saving as an MS-Excel file and importing with read.xlsx
(package xlsx
), I can read the data alright, without specifying encoding I get weird output, as shown:
Student_name
1 PHẠM THANH
2 PHẠM VĂN
3 NGUYỄN TUẤN
4 NGUYỄN VĂN
5 NGUYỄN VĂN
6 VŨ NGỌC
With read.xlsx
, and encoding="UTF-8"
, I get the UTF-8 translation alright, but without hex codes, so the output has the names enclosed in less than and
greater than signs PH <'U+1EA0'>M THANH and so on, without the quotation marks.
I am running R through RStudio,Version 0.99.467, with Windows 7 operating system.
Thank you.