1

I'm trying to create a table using formattable. One of my data columns contains a Czech surname with two letters R does not recognise: ů and č. These two letters are replaced by ? in my table.

I have changed my system locale using the below code, but R still does not recognise these symbols.

Sys.setlocale("LC_CTYPE", "Czech")
[1] "Czech_Czechia.1250"

Edit: I have changed the default text encoding settings of my editor to UTF-8 successfully (Tools>GlobalOptions>Code>saving>Defaulttextencoding).

However, when I attempted to change my locale to UTF-8 I received an error. Below are my code and resulting error message.

Sys.setlocale("LC_ALL",'en_US.UTF-8')


OS reports request to set locale to "en_US.UTF-8" cannot be honored
Tom Oliver
  • 11
  • 3
  • Setting the locale would only work if you also saved the scripts using 1250 as the codepage. Which of course will fail for *other* codepages. Set the locale to UTF8 instead and ensure the scripts are saved as UTF8 too. No encoding is necessary in that case, as you can see - SO HTML pages use UTF8. If you check this page's source code you'll see that no special encoding is used for ů and č. or αυτό εδώ. – Panagiotis Kanavos May 30 '19 at 14:11
  • [This](https://cran.r-project.org/doc/manuals/R-exts.html#Encoding-issues) document explains in more detail how R handles non-ASCII encoding – ihatecsv May 30 '19 at 14:13
  • @ihatecsv the short of it is "use UTF8 for your script and locale". It's not about R itself, it's about the C++ functions. If you use UTF8 for your locale and ensure RStudio or whatever editor you use saves UTF8 files, there's no problem – Panagiotis Kanavos May 30 '19 at 14:14
  • @TomOliver you'll find quite a few similar SO questions. You have to ensure that your editor saves scripts as UTF8 too, otherwise the text will be mangled before R has a chance to parse it. R Studio started using UTF8 by default only [a couple of years ago](https://support.rstudio.com/hc/en-us/articles/200532197-Character-Encoding). Notice how `UTF8` became the `System default` in that screenshot from 2017 – Panagiotis Kanavos May 30 '19 at 14:18
  • @TomOliver another thing that causes confusion is that editors like RStudio may not display Unicode characters properly, replacing them with their R encodings [as shown in this SO question from 2014](https://stackoverflow.com/questions/23324872/rstudio-not-picking-the-encoding-im-telling-it-to-use-when-reading-a-file). This was fixed a couple of years ago too, and now RStudio displays the actual text – Panagiotis Kanavos May 30 '19 at 14:26
  • UPDATE: I was able to more simply rectify this problem by opening the data as a csv file in WordPad and saving it using 'ANSI" from the "Encoding" field. When loaded into R, the letters: ů and č were recognised by RStudio. – Tom Oliver Jun 04 '19 at 10:43

0 Answers0