UTF-8 with R Markdown, knitr and Windows

Question

What?

An .Rmd file is error-free rendered via knitr (or rmarkdown) within from Linux. Related material (i.e. child R scripts and CSV input data) is all set in UTF-8.

Executing the same script from within Windows (actually the script is inside a cloned git repository), does not render all characters cleanly, since it's set to Windows-1252.

Examples

For example, the string "sans réserves", sourced from a CSV into some data.frame's column content, is typeset as "sans rÃ©serves". To read this one correctly, it suffices to add encoding='UTF-8' to read.csv, obviously while reading-in the data.

Another example, that concerns an entry among other R code lines, is the string "Trésorier Général". It is typeset as "TrÃ©sorier GÃ©nÃ©ral". Fortunately, the following advice

read_chunk(lines = readLines("TestSpanishText.R", encoding = "UTF-8"))

taken from https://stackoverflow.com/a/15714617/1172302, works and the string is rendered as expected.

Related

[Update] There are some related Q&As, but they are more than 2-3 years old. As well, this page https://support.rstudio.com/hc/en-us/articles/200532197-Character-Encoding, points to the very issue.

Questions

Is there another, easier way to overcome this issue regarding UTF-8 and Windows, inside R? Recommendations on how to approach such a problem? I am trying to follow a one source for all principle.

ps- An interesting reading: https://superuser.com/a/221602/128768

*"Is there another, easier way to overcome this issue"* - What **is** *"this issue"*? It seems you found out already, that for text to render properly, the renderer needs to know about its character encoding. Truth be told, [there ain't no such thing as Plain Text](http://www.joelonsoftware.com/articles/Unicode.html). — IInspectable, Aug 18 '16 at 17:09
@IInspectable Right, the renderer needs to know. The issue is altogether why bother to set anything for reading files and `read_chunk`ing the scripts? Why not say, in one place, all this is UTF-8, please render it correctly. — Nikos Alexandris, Aug 18 '16 at 20:30
I don't know, if this is possible with R, but you could try to set the [code page](https://msdn.microsoft.com/en-us/library/windows/desktop/dd317756.aspx) to 65001 on Windows, to indicate UTF-8. — IInspectable, Aug 18 '16 at 20:57
@IInspectable Thank you. Here the _issue_: https://support.rstudio.com/hc/en-us/articles/200532197-Character-Encoding — Nikos Alexandris, Aug 18 '16 at 22:01
If I understand, you can set the default encoding in the IDE. Did I miss something, or are you looking for a solution that doesn't require the user to change their environment? — IInspectable, Aug 18 '16 at 22:33
Each attempt to use `Sys.setlocale` returns `OS reports request to set locale to "UTF-8" cannot be honored` (in Windows 7), same as reported in this question: http://stackoverflow.com/q/20571147/1172302. Reportedly, my colleague didn't get it working by setting this in `.Rprofile` too. — Nikos Alexandris, Aug 19 '16 at 04:15
I have the same bug with R Knitr R and LaTeX... have you got a solution ? — Hedjour, Jul 24 '18 at 13:16
@Hedjour No. I was trying to help a colleague at the time. I would still be interested to know how to solve this, though it doesn't bother me directly anymore. And, I don't run Windows to test. — Nikos Alexandris, Jul 24 '18 at 13:36
Mojibake (`Ã©` for `é`) usually implies that latin1/cp1252 is incorrectly involved in the processing. Is a database involved? — Rick James, Nov 07 '18 at 21:35

UTF-8 with R Markdown, knitr and Windows

0 Answers0