4

Starting with only the locale identifier name (string) provided by clients, how or where do I look up the default "list separator" character for that locale?

The "list separator" setting is the character many different types of applications and programming languages may use as the default grouping character when joining or splitting strings and arrays. This is especially important for opening CSV files in spreadsheet programs. Though this is often the comma ",", this default character may be different depending on the machine's region settings. It may even differ between OS's.

I'm not interested in my own server environment here. Instead, I need to know more about the client's based off their locale identifier which they've given to me, so my own server settings are irrelevant. Also for this solution, I can not change the locale setting on this server to match a client's for the entire current process as a shortcut to look this value up.

If this is defined in the ICU library, I'm not able to find any way to look this value up using the INTL extension.

Any hints?

bob-the-destroyer
  • 3,164
  • 2
  • 23
  • 30
  • http://stackoverflow.com/questions/838590/how-to-read-list-separator-from-os-in-java – Mark Baker Feb 03 '13 at 23:09
  • @Mark Baker: Can't use the registry for this solution - security issues and hoping for cross-OS solution. Plus I'm wanting to just look up the character based on a locale name, not what my own server machine is configured to use. On that post, romeok gave a good way to guess the character, but that's still just a guess. – bob-the-destroyer Feb 03 '13 at 23:26
  • Coincidentally, trying to infer the list separator from checking the decimal and numeric grouping separators is what I'm currently using. But in the world of programming, IMHO it's dangerous to assume what a value is solely based on some completely unrelated value. – bob-the-destroyer Feb 03 '13 at 23:53
  • Do you have a way of running some code on the client's machine when they send you their locale? Say, in Javascript, you could send, along with locale, a joined array or a formatted date object, and could then derive the separator from that. – Boris Feb 15 '13 at 14:06
  • @Boris: Perhaps I spoke too soon when I said many different programming languages use this character. Javascript uses the comma as default in `join()` regardless of system settings, and using `toLocaleString()` on an array to discover this character only works in IE. Also, in cases where I'm serving up a file to be downloaded (like a csv), I won't really have the option of running client-side scripts at that point. – bob-the-destroyer Feb 16 '13 at 01:06

2 Answers2

1

I am not sure if my answer will satisfy your requirements but I suggest (especially as you don't want to change the locale on the server) to use a function that will give you the answer:

To my knowledge (and also Wikipedia's it seems) the list separator in a CSV is a comma unless the decimal point of the locale is a comma, in that case the list separator is a semicolon.

So you could get a list of all locales that use a comma (Unicode U+002C) as separator using this command:

cd /usr/share/i18n/locales/
grep decimal_point.*2C *_* -l

and you could then take this list to determine the appropriate list separator:

function get_csv_list_separator($locale) {
    $locales_with_comma_separator =  "az_AZ be_BY bg_BG bs_BA ca_ES crh_UA cs_CZ da_DK de_AT de_BE de_DE de_LU el_CY el_GR es_AR es_BO es_CL es_CO es_CR es_EC es_ES es_PY es_UY es_VE et_EE eu_ES eu_ES@euro ff_SN fi_FI fr_BE fr_CA fr_FR fr_LU gl_ES hr_HR ht_HT hu_HU id_ID is_IS it_IT ka_GE kk_KZ ky_KG lt_LT lv_LV mg_MG mk_MK mn_MN nb_NO nl_AW nl_NL nn_NO pap_AN pl_PL pt_BR pt_PT ro_RO ru_RU ru_UA rw_RW se_NO sk_SK sl_SI sq_AL sq_MK sr_ME sr_RS sr_RS@latin sv_SE tg_TJ tr_TR tt_RU@iqtelif uk_UA vi_VN wo_SN");
    if (stripos($locales_with_comma_separator, $locale) !== false) {
        return ";";
    }
    return ",";
}

(the list of locales is taken from my own Debian machine, I don't know about the completeness of the list)

If you don't want to have this static list of locales (though I assume that this doesn't change that often), you can of course generate the list using the command above and cache it.

As a final note, according to RFC4180 section 2.6 the list separator actually never changes but rather fields containing a comma (so this also means floating numbers, depending on the locale) should be enclosed in double-quotes. Though (as linked above) not many people follow the RFC standard.

Community
  • 1
  • 1
akirk
  • 6,757
  • 2
  • 34
  • 57
  • @akrirk: thanks. Just might work. I'll get back to you. I don't have a Linux system on-hand to compare the locale settings files against Windows. Maybe I can dig one up or just set up a VM. Since you already seem to have a list of locales using the semicolon, this should help. With the list separator, it's the OS's (probably just WIN) fault. If the column is enclosed in quotes, it doesn't matter as the actual column/list separator could be a semicolon on some client machines. Their OS parses the csv file and sees no semicolon character, so it just assumes the entire line is a single column – bob-the-destroyer Feb 18 '13 at 00:51
  • I don't quite get what the OS has to do with this. A CSV will probably be processed by an application, such as MS Excel, won't it? – akirk Feb 18 '13 at 18:41
  • MS Excel depends on the "list separator" as defined in the Control Panel>Region and Language Settings when parsing and opening a CSV file. If the "list separator" for a Windows user is defined as ";" as in some European settings, it can't correctly parse a CSV file where a comma is used. Of course you can use the Excel Wizard to "import" a CSV file and manually specify the comma, but I'm hoping to avoid having the user take these extra steps to open my CSV file. The user expects a spreadsheet, and if they are not Windows savvy to correct the formatting themselves, they will be left confused. – bob-the-destroyer Feb 24 '13 at 04:27
0

There's no such locale setting as "list separator" it might be software specific, but I doubt it's user specific.

However... You can detect user's locale and try to match the settings.

  1. Get browsers locale: $accept_lang = $_SERVER['HTTP_ACCEPT_LANGUAGE']; this might contain a list of comma-separated values. Some browser don't send this though. more here...

  2. Next you can use setlocale(LC_ALL, $accept_lang); and get available locale settings using $locale_info = localeconv(); more here...

Mike
  • 1,158
  • 5
  • 22
  • 32
  • I'm actually using the "Accept-Language" header in conjunction with INTL's `locale_accept_from_http` function to get the exact locale of the client, and reverting to just the comma if that can't be resolved. So I have their locale, now I just need the list separator. Again though, I can't use `setlocale` as that alters the locale settings of my own server and affects all other unrelated clients hitting my website. – bob-the-destroyer Feb 24 '13 at 04:45