5

UPDATE (April 2018):
The problem still persists, under different settings and computers. I believe it is related to all UNICODE, UTF-8 characters.

https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

PROBLEM:

My Rmd/R file is saved with UTF-8 encoding. Other sessionInfo() details:

Platform: x86_64-w64-mingw32/x64 (64-bit)
LC_CTYPE=English_Canada.1252

other attached packages:
[1] knitr_1.17

Here is a simple data frame that I need to print as a table in a html document, e.g. with kable(dt) or any other way.

dt <- data.frame(
name=c("Борис Немцов","Martin Luter King"),
year=c("2015","1968") 
)

Neither of the following works:

Way 1

If I keep Sys.setlocale() as is (i.e. "English_Canada.1252"), then I get this:

> dt;                                                                                           
name year
1 <U+0411><U+043E><U+0440><U+0438><U+0441> <U+041D><U+0435><U+043C><U+0446><U+043E><U+0432> 2015
2 Martin Luter King 1968
> kable(dt)
|name                                                                                      |year |
|:-----------------------------------------------------------------------------------------|:----|
|<U+0411><U+043E><U+0440><U+0438><U+0441> <U+041D><U+0435><U+043C><U+0446><U+043E><U+0432> |2015 |
|Martin Luter King                                                                         |1968 |

Note that <U+....> are printed instead of characters.
Using dt$name <- enc2utf8(as.character(dt$name)) did not help.

Way 2

If I change Sys.setlocale("LC_CTYPE", "russian") #"Russian_Russia.1251"`, then I get this:

> dt; 
name year
1      Áîðèñ Íåìöîâ 2015
2 Martin Luter King 1968

> kable(dt)
|name              |year |
|:-----------------|:----|
|Áîðèñ Íåìöîâ      |2015 |
|Martin Luter King |1968 |

Note that characters have become gibberish.
Using print(dt,encoding="windows-1251"); print(dt,encoding="UTF-8") had no effect.

Any advice?

The closest I could find to address this problem are in the following links, but they did not help: http://blog.rolffredheim.com/2013/01/r-and-foreign-characters.html, https://tomizonor.wordpress.com/2013/04/17/file-utf8-windows, https://www.smashingmagazine.com/2012/06/all-about-unicode-utf8-character-sets

I also tried to save my file with 1251 encoding (instead of current UTF-8 encoding) and some other character conversion/processing packages. Nothing helped yet.

UPDATE:

Opened related question: How to change Sys.setlocale, when you get Error "request to set locale … cannot be honored"

IVIM
  • 2,167
  • 1
  • 15
  • 41
  • I have no problems using my native locale `en_US.UTF-8` when printing to the console or knitting an HTML document. Using LaTeX is another story. – Martin Schmelzer Jan 18 '18 at 11:31
  • Thanks for trying - I tried to set my locale to what you have ` Sys.setlocale("LC_CTYPE", "en_US.UTF-8")` but got this error: `OS reports request to set locale to "en_US.UTF-8" cannot be honored[1] ""` . This may explain why it works for you, but not for me (my local is `LC_CTYPE=English_Canada.1252`). So what can I do? – IVIM Jan 19 '18 at 14:44
  • I fount two related suggested from knitr developer: https://stackoverflow.com/questions/15703702/is-there-a-knitr-option-to-force-utf-8-encoding-in-included-r-files, and https://stackoverflow.com/questions/27982566/rmarkdown-utf-8-error-on-mutliple-operating-systems . The idea is to move UTF-8 code in _separate file_ and then read it from there: `con = file("TestSpanishText.R", encoding = "UTF-8"); read_chunk(con);close(con) ` - – IVIM Jan 19 '18 at 15:06
  • 1
    Can you try to set `Sys.setlocale(, "Russian")` in your `~/.Rprofile`? If you don't know what is `.Rprofile`, you may see https://bookdown.org/yihui/blogdown/global-options.html – Yihui Xie Jan 19 '18 at 16:57
  • Fantastisch! - I did that and printing with `print(dt)` still showed the same gibberish, however printing with` kable(dt)` produced exactly what is needed! So conclusion - putting `Sys.setlocale("LC_CTYPE", "russian")` is not sufficient. You have to put it in .Rprofile and ...it works specifically with `kable()` (thanks to `knitr` developer :) – IVIM Jan 20 '18 at 03:38

1 Answers1

1

The only solution that worked was the one suggested by Yihui Xie (knitr developer), which is :
creating a file .Rprofile, which contains one line Sys.setlocale("LC_CTYPE", "russian") and placing it in your home or working directory.

However, please note that, it works only with use of kable(), i.e with help of knitr package.
If you try to print with print(dt$name[1]), you still get Áîðèñ Íåìöîâ.
However, if you use kable(dt$name[1]), you'll get what you need - Борис Немцов !

IVIM
  • 2,167
  • 1
  • 15
  • 41