3

I am found very strange bug about encoding of character constants in R.

main.R:

options(encoding = "UTF-8")
print(Sys.getlocale())
print(getOption("encoding"))

print("first run")
source("internal.R")
print("")

print("second run")
source("internal.R", encoding = "UTF-8")
print("")

internal.R

print(Sys.getlocale())
print(getOption("encoding"))
char_constant="Тут не просто живут баги, тут у них гнездо"
print(Encoding(char_constant))

Now lets see the output, push source button in R

[1] "ru_RU.UTF-8/ru_RU.UTF-8/ru_RU.UTF-8/C/ru_RU.UTF-8/ru_RU.UTF-8"
[1] "UTF-8"
[1] "first run"
[1] "ru_RU.UTF-8/ru_RU.UTF-8/ru_RU.UTF-8/C/ru_RU.UTF-8/ru_RU.UTF-8"
[1] "UTF-8"
[1] "unknown"
[1] ""
[1] "second run"
[1] "ru_RU.UTF-8/ru_RU.UTF-8/ru_RU.UTF-8/C/ru_RU.UTF-8/ru_RU.UTF-8"
[1] "UTF-8"
[1] "UTF-8"
[1] ""

Notice the difference in encoding. "unknown" first time and "UTF-8" second time. There is obvious small bug source ignores default encoding parameter.

The real bug is what mixing different encodings in data.table causes a lot of problems, and R-studio makes "UTF-8" constant when you execute just one string and makes "unknown" constant when you source whole file.

Do somebody have any idea what is going on and how to make workaround?

R version 3.3.0 (2016-05-03)
Platform: x86_64-apple-darwin14.5.0 (64-bit)
Running under: OS X 10.12.4 (unknown)

locale:
[1] ru_RU.UTF-8/ru_RU.UTF-8/ru_RU.UTF-8/C/ru_RU.UTF-8/ru_RU.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_3.3.0
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
Anatoliy Orlov
  • 469
  • 2
  • 5
  • So this behavior is an unicum to the console in Rstudio but not raw R? – Roman Luštrik May 04 '17 at 17:35
  • no, in raw R behaviour is the same. – Anatoliy Orlov May 04 '17 at 17:48
  • 1
    Have a look [here](http://stackoverflow.com/questions/41743949/utf-8-encoding-not-used-although-it-is-set-in-source) – Christoph May 04 '17 at 17:50
  • yeah. same bug. In my case I've tried set encoding in first line, it is still not works. the real strange thing for me it works with encoding = "UTF-8" parameter, it tooks default from getOption("encoding") for this parameter, and this option returns right value. – Anatoliy Orlov May 04 '17 at 17:56

1 Answers1

0

On Windows, R's source function does not work with files that include characters that aren't part of the current system encoding. You may have trouble with RStudio's Run All and Source on Save commands, as they rely on source.

Take a look at: https://support.rstudio.com/hc/en-us/articles/200532197-Character-Encoding

Vitor Costa
  • 3
  • 1
  • 3