17

I'm using ISO 3166-1-alpha 2 codes to pass to an application to retrieve a localised feed e.g. /feeds/us for the USA. I have a switch statement which serves a feed based on that country_code.

Is there a way to convert that two digit code to the language code e.g. en_US ? I'm wondering if there is a standard / function / library for doing this in PHP or whether I need to build my own array?

codecowboy
  • 9,835
  • 18
  • 79
  • 134
  • 2
    Which language code should "CH" use? "IN"? "NO"? – Ignacio Vazquez-Abrams Apr 16 '12 at 14:10
  • Take a look at the new list, the list provided in the answer is way out of date. this one contains 422 - 460 entries. http://msdn.microsoft.com/en-us/library/cc233968.aspx – Erx_VB.NExT.Coder Sep 28 '13 at 01:29
  • I was going to suggest that if your goal is to use the locale to format currency for a local market there is a strategy that HACKS it through: – David Lundquist Oct 12 '17 at 08:25
  • If your goal for the language locale is to display currency values correctly I have found this simple HACK that works "OK": prepend 'en_' to your country code. The currency display will default to the common standard for that country. "; ?> – David Lundquist Oct 12 '17 at 08:32

5 Answers5

18

As other have pointed out, there is no built-in function as this likely due to the reality of many countries having multiple languages. So unfortunately, I can't point you to a library that does this, but I did go ahead and write a little function which does what you want.

There are two caveats, one being if it isn't provided a language it will just pick the first locale in the list. To get around this, you'd have to put some logic around the function call to provide it with the appropriate language. The other is that it needs to have php5-intl installed.

<?php

/**
/* Returns a locale from a country code that is provided.
/*
/* @param $country_code  ISO 3166-2-alpha 2 country code
/* @param $language_code ISO 639-1-alpha 2 language code
/* @returns  a locale, formatted like en_US, or null if not found
/**/
function country_code_to_locale($country_code, $language_code = '')
{
    // Locale list taken from:
    // http://stackoverflow.com/questions/3191664/
    // list-of-all-locales-and-their-short-codes
    $locales = array('af-ZA',
                    'am-ET',
                    'ar-AE',
                    'ar-BH',
                    'ar-DZ',
                    'ar-EG',
                    'ar-IQ',
                    'ar-JO',
                    'ar-KW',
                    'ar-LB',
                    'ar-LY',
                    'ar-MA',
                    'arn-CL',
                    'ar-OM',
                    'ar-QA',
                    'ar-SA',
                    'ar-SY',
                    'ar-TN',
                    'ar-YE',
                    'as-IN',
                    'az-Cyrl-AZ',
                    'az-Latn-AZ',
                    'ba-RU',
                    'be-BY',
                    'bg-BG',
                    'bn-BD',
                    'bn-IN',
                    'bo-CN',
                    'br-FR',
                    'bs-Cyrl-BA',
                    'bs-Latn-BA',
                    'ca-ES',
                    'co-FR',
                    'cs-CZ',
                    'cy-GB',
                    'da-DK',
                    'de-AT',
                    'de-CH',
                    'de-DE',
                    'de-LI',
                    'de-LU',
                    'dsb-DE',
                    'dv-MV',
                    'el-GR',
                    'en-029',
                    'en-AU',
                    'en-BZ',
                    'en-CA',
                    'en-GB',
                    'en-IE',
                    'en-IN',
                    'en-JM',
                    'en-MY',
                    'en-NZ',
                    'en-PH',
                    'en-SG',
                    'en-TT',
                    'en-US',
                    'en-ZA',
                    'en-ZW',
                    'es-AR',
                    'es-BO',
                    'es-CL',
                    'es-CO',
                    'es-CR',
                    'es-DO',
                    'es-EC',
                    'es-ES',
                    'es-GT',
                    'es-HN',
                    'es-MX',
                    'es-NI',
                    'es-PA',
                    'es-PE',
                    'es-PR',
                    'es-PY',
                    'es-SV',
                    'es-US',
                    'es-UY',
                    'es-VE',
                    'et-EE',
                    'eu-ES',
                    'fa-IR',
                    'fi-FI',
                    'fil-PH',
                    'fo-FO',
                    'fr-BE',
                    'fr-CA',
                    'fr-CH',
                    'fr-FR',
                    'fr-LU',
                    'fr-MC',
                    'fy-NL',
                    'ga-IE',
                    'gd-GB',
                    'gl-ES',
                    'gsw-FR',
                    'gu-IN',
                    'ha-Latn-NG',
                    'he-IL',
                    'hi-IN',
                    'hr-BA',
                    'hr-HR',
                    'hsb-DE',
                    'hu-HU',
                    'hy-AM',
                    'id-ID',
                    'ig-NG',
                    'ii-CN',
                    'is-IS',
                    'it-CH',
                    'it-IT',
                    'iu-Cans-CA',
                    'iu-Latn-CA',
                    'ja-JP',
                    'ka-GE',
                    'kk-KZ',
                    'kl-GL',
                    'km-KH',
                    'kn-IN',
                    'kok-IN',
                    'ko-KR',
                    'ky-KG',
                    'lb-LU',
                    'lo-LA',
                    'lt-LT',
                    'lv-LV',
                    'mi-NZ',
                    'mk-MK',
                    'ml-IN',
                    'mn-MN',
                    'mn-Mong-CN',
                    'moh-CA',
                    'mr-IN',
                    'ms-BN',
                    'ms-MY',
                    'mt-MT',
                    'nb-NO',
                    'ne-NP',
                    'nl-BE',
                    'nl-NL',
                    'nn-NO',
                    'nso-ZA',
                    'oc-FR',
                    'or-IN',
                    'pa-IN',
                    'pl-PL',
                    'prs-AF',
                    'ps-AF',
                    'pt-BR',
                    'pt-PT',
                    'qut-GT',
                    'quz-BO',
                    'quz-EC',
                    'quz-PE',
                    'rm-CH',
                    'ro-RO',
                    'ru-RU',
                    'rw-RW',
                    'sah-RU',
                    'sa-IN',
                    'se-FI',
                    'se-NO',
                    'se-SE',
                    'si-LK',
                    'sk-SK',
                    'sl-SI',
                    'sma-NO',
                    'sma-SE',
                    'smj-NO',
                    'smj-SE',
                    'smn-FI',
                    'sms-FI',
                    'sq-AL',
                    'sr-Cyrl-BA',
                    'sr-Cyrl-CS',
                    'sr-Cyrl-ME',
                    'sr-Cyrl-RS',
                    'sr-Latn-BA',
                    'sr-Latn-CS',
                    'sr-Latn-ME',
                    'sr-Latn-RS',
                    'sv-FI',
                    'sv-SE',
                    'sw-KE',
                    'syr-SY',
                    'ta-IN',
                    'te-IN',
                    'tg-Cyrl-TJ',
                    'th-TH',
                    'tk-TM',
                    'tn-ZA',
                    'tr-TR',
                    'tt-RU',
                    'tzm-Latn-DZ',
                    'ug-CN',
                    'uk-UA',
                    'ur-PK',
                    'uz-Cyrl-UZ',
                    'uz-Latn-UZ',
                    'vi-VN',
                    'wo-SN',
                    'xh-ZA',
                    'yo-NG',
                    'zh-CN',
                    'zh-HK',
                    'zh-MO',
                    'zh-SG',
                    'zh-TW',
                    'zu-ZA',);

    foreach ($locales as $locale)
    {
        $locale_region = locale_get_region($locale);
        $locale_language = locale_get_primary_language($locale);
        $locale_array = array('language' => $locale_language,
                             'region' => $locale_region);

        if (strtoupper($country_code) == $locale_region &&
            $language_code == '')
        {
            return locale_compose($locale_array);
        }
        elseif (strtoupper($country_code) == $locale_region &&
                strtolower($language_code) == $locale_language)
        {
            return locale_compose($locale_array);
        }
    }

    return null;
}
?>
TheJF
  • 831
  • 9
  • 13
8

As noted by other answers there is no one to one mapping between countries and languages. However, if you have the PHP Intl extension installed it should be possible to use the Unicode CLDR likely subtags data to get the “default” or “likely” language for a specific country:

function getLanguage(string $country): string {
    $subtags = \ResourceBundle::create('likelySubtags', 'ICUDATA', false);
    $country = \Locale::canonicalize('und_'.$country);
    $locale = $subtags->get($country) ?: $subtags->get('und');
    return \Locale::getPrimaryLanguage($locale);
}

Now when you call the getLanguage() function with a country code you get the according language code back:

getLanguage('US'); // "en"
getLanguage('GB'); // "en"
getLanguage('DE'); // "de"
getLanguage('CH'); // "de"
getLanguage('IN'); // "hi"
getLanguage('NO'); // "nb"
getLanguage('BR'); // "pt"

This also works fine for three letter country codes:

getLanguage('USA'); // "en"
getLanguage('GBR'); // "en"
getLanguage('AUT'); // "de"
getLanguage('FRA'); // "fr"

And even UN M49 codes:

getLanguage('003'); // "en"
getLanguage('013'); // "es"
getLanguage('039'); // "it"
getLanguage('155'); // "de"
ausi
  • 7,253
  • 2
  • 31
  • 48
  • on PHP 7.4/windows your code returns "en" for all above samples (the $locale in your method always gets 'en_Latn_US' regardless of the input) – atyachin Nov 20 '21 at 07:38
  • @atyachin please check what the following code prints on your system: `print_r(iterator_to_array(\ResourceBundle::create('likelySubtags', 'ICUDATA', false)));` It should look something like this: https://3v4l.org/lLgMd – ausi Nov 20 '21 at 08:39
  • The issue is with the canonicalize method that returns the format _US. Can be fixed by: `$locale = $subtags->get("und_".strtoupper($country)) ?: $subtags->get('und');` – atyachin Nov 21 '21 at 20:03
  • @atyachin If `\Locale::canonicalize('und_de')` returns `'en_US'` in your system, I’m pretty sure there is something wrong with the setup. Can you test what the output is with `\Locale::canonicalize('und_gb')` and with `\Locale::canonicalize('en_GB')`? – ausi Nov 21 '21 at 20:05
  • If I remove `und_` from `$country = \Locale::canonicalize($country);` it appears to work – Tim Ramsey May 04 '23 at 13:41
  • Worth saying that for me - I had to define the $country myself with `$locale = $subtags['und_'.$countryCode];` and this then worked ... – Antony Jul 26 '23 at 14:13
6

You cannot automatically convert country code to language code because some countries use multiple languages. On the other hand, OS localization system may support multiple variants of a single language for different countries (for example, en_GB vs en_US).

For example, Switzerland (CH) has both German and French commonly used (64% and 20% of the population, according to http://en.wikipedia.org/wiki/Switzerland). If you have to decide a single language for country code CH either of those languages could make sense for some people. Note that some parts of the Switzerland use only German or French as the official language (but not both, see http://en.wikipedia.org/wiki/File:Sprachen_CH_2000_EN.svg for details).

If you MUST select a single language for each country, I'd suggest doing the selection by hand for every country you support. For an half-assed automatic implementation, you could scan through your available localizations and select the first one that has the matching country code after the underscore.

Also note the corollary: languages cannot be represented by national flags because languages and countries do not have 1:1 relation. One to many relations can be found in both directions.

Mikko Rantalainen
  • 14,132
  • 10
  • 74
  • 112
  • mikko you know the world and a quality answer. As a swiss i want to say, that in a multilingual country it is usual to choose the county in its language. I take apple as example, they switch betweeen CH_de and CH_fr or spelled out between Schweiz-German and Suisse-Français. – endo.anaconda May 01 '12 at 00:36
  • @endo.anaconda: I guess you meant de_CH and fr_CH. I didn't know that using country name followed by language name was a common labelling style. Here in Finland, the locale sv_FI is usually referred as "suomenruotsi" which translates directly to "Finland's Swedish" instead of "Swedish of Finland", too. – Mikko Rantalainen May 02 '12 at 10:46
1

You will want to cross reference these files:

http://www.ethnologue.com/codes/LanguageIndex.tab http://www.ethnologue.com/codes/CountryCodes.tab http://www.ethnologue.com/codes/LanguageCodes.tab

..or get them all in one zip here: http://www.ethnologue.com/codes/Language_Code_Data_20110104.zip

There is no current set PHP function that returns this data that I'm aware of.

AO_
  • 2,573
  • 3
  • 30
  • 31
1

the answer from TheJF is pretty good, however there are a few (general) issues that I came across:

  • his code will return br-FR if you call country_code_to_locale("FR") - now br (Breton) is not even an offical language according to Wikipedia. Although fr-FR is in the list, br-FR is the first in the array. this happens with many other countries too.

  • many other locale lists are trying to be extremly complete and consider all possible languages

  • it is difficult to draw the line here, good examples where you certainly want to keep multiple languages for a country are: Canada and Switzerland

I went with a simple approach:

  • I kept only 1 language for most countries, and left multiple for some countries like BE, CA, CH, ZA. I kept es-US, but I am not sure about that (Wikipedia says: Official languages: None at federal level)

  • I also kept multiple languages for countries I was too lazy to research or that use both, Latin and Cyrillic

  • I added shuffle($locales); which will randomize the array, such that we get random locales for countries with multiple languages. It made sense for my use case, but you might want to remove that.

  • For my purpose, only languages that have relevant prevalence on the web are of interest. This list is by no means complete or correct, but pragmatic.

So here is my locale list:

$locales = array('af-ZA',
                'am-ET',
                'ar-AE',
                'ar-BH',
                'ar-DZ',
                'ar-EG',
                'ar-IQ',
                'ar-JO',
                'ar-KW',
                'ar-LB',
                'ar-LY',
                'ar-MA',
                'ar-OM',
                'ar-QA',
                'ar-SA',
                'ar-SY',
                'ar-TN',
                'ar-YE',
                'az-Cyrl-AZ',
                'az-Latn-AZ',
                'be-BY',
                'bg-BG',
                'bn-BD',
                'bs-Cyrl-BA',
                'bs-Latn-BA',
                'cs-CZ',
                'da-DK',
                'de-AT',
                'de-CH',
                'de-DE',
                'de-LI',
                'de-LU',
                'dv-MV',
                'el-GR',
                'en-AU',
                'en-BZ',
                'en-CA',
                'en-GB',
                'en-IE',
                'en-JM',
                'en-MY',
                'en-NZ',
                'en-SG',
                'en-TT',
                'en-US',
                'en-ZA',
                'en-ZW',
                'es-AR',
                'es-BO',
                'es-CL',
                'es-CO',
                'es-CR',
                'es-DO',
                'es-EC',
                'es-ES',
                'es-GT',
                'es-HN',
                'es-MX',
                'es-NI',
                'es-PA',
                'es-PE',
                'es-PR',
                'es-PY',
                'es-SV',
                'es-US',
                'es-UY',
                'es-VE',
                'et-EE',
                'fa-IR',
                'fi-FI',
                'fil-PH',
                'fo-FO',
                'fr-BE',
                'fr-CA',
                'fr-CH',
                'fr-FR',
                'fr-LU',
                'fr-MC',
                'he-IL',
                'hi-IN',
                'hr-BA',
                'hr-HR',
                'hu-HU',
                'hy-AM',
                'id-ID',
                'ig-NG',
                'is-IS',
                'it-CH',
                'it-IT',
                'ja-JP',
                'ka-GE',
                'kk-KZ',
                'kl-GL',
                'km-KH',
                'ko-KR',
                'ky-KG',
                'lb-LU',
                'lo-LA',
                'lt-LT',
                'lv-LV',
                'mi-NZ',
                'mk-MK',
                'mn-MN',
                'ms-BN',
                'ms-MY',
                'mt-MT',
                'nb-NO',
                'ne-NP',
                'nl-BE',
                'nl-NL',
                'pl-PL',
                'prs-AF',
                'ps-AF',
                'pt-BR',
                'pt-PT',
                'ro-RO',
                'ru-RU',
                'rw-RW',
                'sv-SE',
                'si-LK',
                'sk-SK',
                'sl-SI',
                'sq-AL',
                'sr-Cyrl-BA',
                'sr-Cyrl-CS',
                'sr-Cyrl-ME',
                'sr-Cyrl-RS',
                'sr-Latn-BA',
                'sr-Latn-CS',
                'sr-Latn-ME',
                'sr-Latn-RS',
                'sw-KE',
                'tg-Cyrl-TJ',
                'th-TH',
                'tk-TM',
                'tr-TR',
                'uk-UA',
                'ur-PK',
                'uz-Cyrl-UZ',
                'uz-Latn-UZ',
                'vi-VN',
                'wo-SN',
                'yo-NG',
                'zh-CN',
                'zh-HK',
                'zh-MO',
                'zh-SG',
                'zh-TW');

and the code:

function country_code_to_locale($country_code)
{
    $locales = ...

    // randomize the array, such that we get random locales
    // for countries with multiple languages (CA, CH)
    shuffle($locales);

    foreach ($locales as $locale) {
        $locale_region = locale_get_region($locale);

        if (strtoupper($country_code) == $locale_region) {
            return $locale;
        }
    }

    return "en-US";
}
Eugen
  • 537
  • 6
  • 14
  • If you get from PHP "undefined function locale_get_region() " is because you need to enable the intl library. To do so, you need to go to your php.ini and uncomment this line: ;extension=intl. If you're running on Windows and got an "icuuc57.dll" not found, you need to be sure PHP executable folder has this library and this folder is in the system PATH. Good luck, making the Eugen function work was a roller coaster ride!! – Gustavo Rodríguez Aug 04 '22 at 20:32