I'm trying to get the day of the week, and have it work consistently in any locale. In locales with Latin alphabets, everything is fine.
Sys.getlocale()
## [1] "LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252"
weekdays(Sys.Date())
## [1] "Tuesday"
I have two related problems with other locales.
If I set
Sys.setlocale("LC_ALL", "Arabic_Qatar")
## [1] "LC_COLLATE=Arabic_Qatar.1256;LC_CTYPE=Arabic_Qatar.1256;LC_MONETARY=Arabic_Qatar.1256;LC_NUMERIC=C;LC_TIME=Arabic_Qatar.1256"
then I sometimes (correctly) get
weekdays(Sys.Date())
## [1] "الثلاثاء
and sometimes get
weekdays(Sys.Date())
## [1] "ÇáËáÇËÇÁ"
depending upon my setup. The problem is, I can't figure out what is causing the difference.
I thought it might be something to do with getOption("encoding")
, but I've tried explicitly setting options(encoding = "native.enc")
and options(encoding = "UTF-8")
and it makes no difference.
I've tried several recent versions of R, and the problem is consistent across all of them.
At the moment, the string displays correctly in R GUI, but incorrectly when I use an IDE (Architect and RStudio tested).
What should I set to ensure that weekdays always displays correctly?
It may be helpful to know that weekdays(Sys.Date())
is equivalent to format(as.POSIXlt(Sys.Date()), "%A")
, which calls an internal format.POSIXlt
method.
Secondly, it seems overkill to change all of the locale. I thought I should just be able to set the time options. However, if I set individual components of the locale, weekdays
returns a string of question marks.
for(category in c("LC_TIME", "LC_CTYPE", "LC_COLLATE", "LC_MONETARY"))
{
Sys.setlocale(category, "Arabic_Qatar")
print(Sys.getlocale())
print(weekdays(Sys.Date()))
}
## [1] "LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=Arabic_Qatar.1256"
## [1] "????????"
## [1] "LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=Arabic_Qatar.1256;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=Arabic_Qatar.1256"
## [1] "????????"
## [1] "LC_COLLATE=Arabic_Qatar.1256;LC_CTYPE=Arabic_Qatar.1256;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=Arabic_Qatar.1256"
## [1] "????????"
## [1] "LC_COLLATE=Arabic_Qatar.1256;LC_CTYPE=Arabic_Qatar.1256;LC_MONETARY=Arabic_Qatar.1256;LC_NUMERIC=C;LC_TIME=Arabic_Qatar.1256"
## [1] "????????"
What parts of the locale affect how the weekdays are printed?
Update: The problem seems to be Windows-related. When I run the code on a Linux box with locale "ar_QA.UTF8"
, the weekdays are correctly displayed.
Further update: As agstudy mentioned in his answer, setting locales under Windows is odd, since you can't just use ISO codes like "en-GB". For Windows 7/Vista/Server 2003/XP you can set a locale using setlocale language strings or National Language Support values. For Qatari Arabic, there is no setlocale language string, so we must use an NLS value. We have several choices:
Sys.setlocale("LC_TIME", "ARQ") # the language abbreviation name
Sys.setlocale("LC_TIME", "Arabic_Qatar") # corresponding to the language/country pair "Arabic (Qatar)"
Sys.setlocale("LC_TIME", "Arabic_Qatar.1256") # explicitly including the ANSI codepage
Sys.setlocale("LC_TIME", "Arabic") # would sometimes be a possibility too, but it defaults to Saudi Arabic
So the problem isn't that R cannot support Arabic locales under Windows (though I'm not entirely convinced of the robustness of Sys.setlocale
).
Desperate last ditch attempt: Trying to magically fix things by using Windows Management Instrumentation Command to change the OS locale doesn't work, since R doesn't appear to recognise the changes.
system("wmic os set locale=MS_4001")
## Updating property(s) of '\\PC402729\ROOT\CIMV2:Win32_OperatingSystem=@'
## Property(s) update successful.
system("wmic os get locale") # same as before