22

I'm trying to find a reliable way of finding locale codes to pass to Sys.setlocale.

The ?Sys.setlocale help page just states that the allowed values are OS dependent, and gives these examples:

Sys.setlocale("LC_TIME", "de")     # Solaris: details are OS-dependent
Sys.setlocale("LC_TIME", "de_DE.utf8")   # Modern Linux etc.
Sys.setlocale("LC_TIME", "de_DE.UTF-8")  # ditto
Sys.setlocale("LC_TIME", "de_DE")  # Mac OS X, in UTF-8
Sys.setlocale("LC_TIME", "German") # Windows

Under Linux, the possibilities can be retrieved using

locales <- system("locale -a", intern = TRUE)
##  [1] "C"                    "C.utf8"               "POSIX"               
##  [4] "af_ZA"                "af_ZA.utf8"           "am_ET"
##  ...

I don't have Solaris or Mac machines to hand, but I guess that that output can be generated from that using something like:

library(stringr)
unique(str_split_fixed(locales, "_", 2)[, 1])    #Solaris
unique(str_split_fixed(locales, "\\.", 2)[, 1])  #Mac

Locales on Windows are much more problematic: they require long names of the form “language_country”, for example:

Sys.setlocale("LC_ALL", "German_Germany")

I can't find a reliable reference for the list of locales under Windows. Calling locale -a from the Windows command line fails unless cygwin is installed, and then it returns the same values as under Linux (I'm guessing it's accessing values in a standard C library.)

There doesn't seem to be a list of locales packaged with R (I thought there might something similar to share/zoneinfo/zone.tab that contains time zone details).

My current best strategy is to browse this webpage from Microsoft and form the name by manipulating the SUBLANG column of the table.

http://msdn.microsoft.com/en-us/library/dd318693.aspx

Some guesswork is needed, for example the locale related to SUBLANG_ENGLISH_UK is English_United Kingdom.

Sys.setlocale("LC_ALL", "English_United Kingdom")

Where there are variants in different alphabets, parentheses are needed.

Sys.setlocale("LC_ALL", "Uzbek (Latin)_Uzbekistan")
Sys.setlocale("LC_ALL", "Uzbek (Cyrillic)_Uzbekistan")

This guesswork wouldn't be too bad, but many locales don't work at all, including most Indian locales.

Sys.setlocale("LC_ALL", "Hindi_India")
Sys.setlocale("LC_ALL", "Tamil_India")
Sys.setlocale("LC_ALL", "Sindhi_Pakistan")
Sys.setlocale("LC_ALL", "Nynorsk_Norway")
Sys.setlocale("LC_ALL", "Amharic_Ethiopia")

The Windows Region and Language dialog box (Windows\System32\intl.cpl, see pic) has a similar but not identical list of available locales, but I don't know where that is populated from.

enter image description here

There are several related questions:
1. Mac and Solaris people: please can you check to see if my code for getting locales works under your OS.
2. Indian/Pakistani/Norwegian/Ethiopian people using Windows: Please can you tell me what Sys.getlocale() returns for you.
3. Other Windows people: Is there any better documentation on which locales are available?

Update: After clicking links in the question that Ben B mentioned, I stumbled across this better list of locales in Windows. By manually changing the locale using the Region and Language dialog and calling Sys.getlocale(), I deduced that Nynorsk is "Norwegian-Nynorsk_Norway". There are still many oddities, for example

Sys.setlocale(, "Inuktitut (Latin)_Canada")

is fine, but

Sys.setlocale(, "Inuktitut (Syllabics)_Canada")

fails (as do most of the Indian languages). Starting R in any of these locales causes a warning, and R's locale to revert to C.

I'm still interested to hear from any Indians, etc., as to what locale you have.

Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
  • 2
    http://stackoverflow.com/questions/5152866/list-of-locales-in-windows , http://superuser.com/questions/166089/where-is-the-list-of-available-windows-locales (but the latter question is yours!) ... googling "windows list available locales" shows questions you have asked in other places (I can't tell whether you've forgotten that you asked, or whether none of these answers works -- it seems to be quite a mess) – Ben Bolker Jan 07 '14 at 02:25
  • 2
    @BenBolker It seems Senile dementia is setting in. I remembered having problems with locales before, but I'd completely forgotten asking that question. Thanks for the reminder. – Richie Cotton Jan 07 '14 at 13:23
  • See also http://stackoverflow.com/q/26603564/134830 – Richie Cotton Oct 28 '14 at 14:49

2 Answers2

9

In answer to your first question, here's the output on my Mac:

> locales <- system("locale -a", intern = TRUE)
> library(stringr)
> unique(str_split_fixed(locales, "\\.", 2)[, 1]) 
 [1] "af_ZA" "am_ET" "be_BY" "bg_BG" "ca_ES" "cs_CZ" "da_DK" "de_AT" "de_CH"
[10] "de_DE" "el_GR" "en_AU" "en_CA" "en_GB" "en_IE" "en_NZ" "en_US" "es_ES"
[19] "et_EE" "eu_ES" "fi_FI" "fr_BE" "fr_CA" "fr_CH" "fr_FR" "he_IL" "hi_IN"
[28] "hr_HR" "hu_HU" "hy_AM" "is_IS" "it_CH" "it_IT" "ja_JP" "kk_KZ" "ko_KR"
[37] "lt_LT" "nl_BE" "nl_NL" "no_NO" "pl_PL" "pt_BR" "pt_PT" "ro_RO" "ru_RU"
[46] "sk_SK" "sl_SI" "sr_YU" "sv_SE" "tr_TR" "uk_UA" "zh_CN" "zh_HK" "zh_TW"
[55] "C"     "POSIX"

I'm not sure what I'm expecting to see with Sys.setlocale() but it doesn't throw any errors:

> Sys.setlocale(locale="he_IL")
[1] "he_IL/he_IL/he_IL/C/he_IL/en_AU.UTF-8"
> Sys.getlocale()
[1] "he_IL/he_IL/he_IL/C/he_IL/en_AU.UTF-8"
Scott Ritchie
  • 10,293
  • 3
  • 28
  • 64
0

Thanks all. I went to the URL that Richie suggested, http://msdn.microsoft.com/en-us/library/dd318693.aspx, and tried LANG_BELARUSIAN in windows. That didn't work, so I lopped off the "LANG_" and included "BELARUSIAN" by itself. Worked fine.

> bk.date1

[1] "Ma 2012 august 14 11:28:30 "

ymd_hms(bk.date1, locale = "BELARUSIAN") [1] "2012-08-14 11:28:30 UTC"

Bill Yarberry
  • 101
  • 1
  • 3