8

What?

An .Rmd file is error-free rendered via knitr (or rmarkdown) within from Linux. Related material (i.e. child R scripts and CSV input data) is all set in UTF-8.

Executing the same script from within Windows (actually the script is inside a cloned git repository), does not render all characters cleanly, since it's set to Windows-1252.

Examples

For example, the string "sans réserves", sourced from a CSV into some data.frame's column content, is typeset as "sans réserves". To read this one correctly, it suffices to add encoding='UTF-8' to read.csv, obviously while reading-in the data.

Another example, that concerns an entry among other R code lines, is the string "Trésorier Général". It is typeset as "Trésorier Général". Fortunately, the following advice

read_chunk(lines = readLines("TestSpanishText.R", encoding = "UTF-8"))

taken from https://stackoverflow.com/a/15714617/1172302, works and the string is rendered as expected.

Related

[Update] There are some related Q&As, but they are more than 2-3 years old. As well, this page https://support.rstudio.com/hc/en-us/articles/200532197-Character-Encoding, points to the very issue.

Questions

Is there another, easier way to overcome this issue regarding UTF-8 and Windows, inside R? Recommendations on how to approach such a problem? I am trying to follow a one source for all principle.

ps- An interesting reading: https://superuser.com/a/221602/128768

Community
  • 1
  • 1
Nikos Alexandris
  • 708
  • 2
  • 22
  • 36
  • *"Is there another, easier way to overcome this issue"* - What **is** *"this issue"*? It seems you found out already, that for text to render properly, the renderer needs to know about its character encoding. Truth be told, [there ain't no such thing as Plain Text](http://www.joelonsoftware.com/articles/Unicode.html). – IInspectable Aug 18 '16 at 17:09
  • @IInspectable Right, the renderer needs to know. The issue is altogether why bother to set anything for reading files and `read_chunk`ing the scripts? Why not say, in one place, all this is UTF-8, please render it correctly. – Nikos Alexandris Aug 18 '16 at 20:30
  • I don't know, if this is possible with R, but you could try to set the [code page](https://msdn.microsoft.com/en-us/library/windows/desktop/dd317756.aspx) to 65001 on Windows, to indicate UTF-8. – IInspectable Aug 18 '16 at 20:57
  • @IInspectable Thank you. Here the _issue_: https://support.rstudio.com/hc/en-us/articles/200532197-Character-Encoding – Nikos Alexandris Aug 18 '16 at 22:01
  • If I understand, you can set the default encoding in the IDE. Did I miss something, or are you looking for a solution that doesn't require the user to change their environment? – IInspectable Aug 18 '16 at 22:33
  • Each attempt to use `Sys.setlocale` returns `OS reports request to set locale to "UTF-8" cannot be honored` (in Windows 7), same as reported in this question: http://stackoverflow.com/q/20571147/1172302. Reportedly, my colleague didn't get it working by setting this in `.Rprofile` too. – Nikos Alexandris Aug 19 '16 at 04:15
  • I have the same bug with R Knitr R and LaTeX... have you got a solution ? – Hedjour Jul 24 '18 at 13:16
  • @Hedjour No. I was trying to help a colleague at the time. I would still be interested to know how to solve this, though it doesn't bother me directly anymore. And, I don't run Windows to test. – Nikos Alexandris Jul 24 '18 at 13:36
  • Mojibake (`é` for `é`) usually implies that latin1/cp1252 is incorrectly involved in the processing. Is a database involved? – Rick James Nov 07 '18 at 21:35

0 Answers0