I'm confused about why certain characters (e.g. "Ě", "Č", and "ŝ") lose their diacritical marks in a data frame, while others (e.g. "Š" and "š") do not. My OS is Windows 10, by the way. In my sample code below, a vector czechvec has 11 single-character strings, all Slavic accented characters. R displays those characters properly. Then a data frame mydf is created with czechvec as the second column (the function I() is used so it won't be converted to a factor). But then when R displays mydf or any row of mydf, it converts most of these characters to their plain-ascii equivalent; e.g. mydf[3,] shows the character as "E" not "Ě". But subscripting with row and column, e.g. mydf[3,2], it properly shows the accented character ("Ě"). Why should it make a difference whether R displays the whole row or just one cell? And why are some characters like "Š" completely unaffected? Also when I write this data frame to a file, it completely loses the accent, even though I specify fileEncoding="UTF-8".
> charvals <- c(193, 269, 282, 268, 262, 263, 348, 349, 350, 352, 353)
> hexvals <- as.hexmode(charvals)
> czechvec <- unlist(strsplit(intToUtf8(charvals), ""))
> czechvec
[1] "Á" "č" "Ě" "Č" "Ć" "ć" "Ŝ" "ŝ" "Ş" "Š" "š"
>
> mydf = data.frame(dec=charvals, char=I(czechvec), hex=I(format(hexvals, width=4, upper.case=TRUE)))
> mydf
dec char hex
1 193 Á 00C1
2 269 c 010D
3 282 E 011A
4 268 C 010C
5 262 C 0106
6 263 c 0107
7 348 S 015C
8 349 s 015D
9 350 S 015E
10 352 Š 0160
11 353 š 0161
> mydf[3,2]
[1] "Ě"
> mydf[3,]
dec char hex
3 282 E 011A
>
> write.table(mydf, file="myfile.txt", fileEncoding="UTF-8")
>
> df2 <- read.table("myfile.txt", stringsAsFactors=FALSE, fileEncoding="UTF-8")
> df2[3,2]
[1] "E"
Edited to add: Per Ernest A's answer, this behaviour is not reproducible in Linux. It must be a Windows issue. (I'm using R 3.4.1 for Windows.)