Questions tagged [ietf-bcp-47]

Use this tag for questions related to the handling of identifiers ("tags") for spoken and written languages, as handled in a programming context. Specifically, this tag is for identifiers which conform to the IETF's BCP 47 document "Tags for Identifying Languages".

Overview

The IETF's BCP 47 document is a "best current practices" document for the identification of written and spoken languages, through the creation of language tags.

The document:

"specifies a particular identifier mechanism (the language tag) and a registration function for values to be used to form tags."


Basic Examples

  • fr - French
  • fr-CA - Canadian French
  • es-419 - Spanish as used in Latin America and the Carribbean
  • zh-Hant - Chinese written using Traditional Han script

Structure Overview

BCP 47 language tags have a flexible structure which can contain the following subtags, separated by dashes:

language-extlang-script-region-variant-extension-private

The language subtag is mandatory and must come first. Its values are taken from ISO 639 language codes.

The extlang (extended language) subtag can be used to provide more specificity - for example, cmn for Mandarin in zh-cmn (Mandarin Chinese).

The script subtag can be used to make a distinction between different written formats of a language (for example, Hant vs. Hans for traditional vs. simplified Chinese).

The region subtag can be a country code (CA in fr-CA) or a UN M.49 region code (419 in es-419).

The variant subtag can provide a finer-grained definition for dialects and scripts. This is not typically needed in most common usages.

The extension and private subtags can be used for further customized language data.


Resources

16 questions
22
votes
7 answers

Getting the user's region with navigator.language

For some time, I've been using something like this to get my user's country (ISO-3166): const region = navigator.language.split('-')[1]; // 'US' I've always assumed the string would be similar to en-US -- where the country would hold the 2nd…
Jeff
  • 2,293
  • 4
  • 26
  • 43
16
votes
1 answer

How to I get the IETF BCP47 Language code in Android API < 21

Is there a clever way to get the BCP47 language code in Android for APIs less than 21? In API level 21+ the Locale.toLanguageTag is exactly what I need. How would you get this in lower API levels?
superdave
  • 1,928
  • 1
  • 17
  • 35
5
votes
2 answers

How to convert IETF BCP 47 language identifier to ISO-639-2?

I am writing a server API for an iOS application. As a part of the initialization process, the app should send the phone interface language to server via an API call. The problem is that Apple uses something called IETF BCP 47 language identifier in…
Adam Matan
  • 128,757
  • 147
  • 397
  • 562
2
votes
0 answers

IANA time zone ID to BCP-47 using ICU4C

Given an IANA time zone ID, such as "America/New_York" or "Europe/Lisbon", how can I obtain the corresponding BCP-47 time zone ID, such as "usnyc" or "ptlis", using ICU4C? These values are required to generate Unicode BCP-47 Locale IDs with…
kpozin
  • 25,691
  • 19
  • 57
  • 76
1
vote
0 answers

Are all combinations of language codes and regions in the language-subtag-registry valid?

RFC 5646 (https://www.rfc-editor.org/rfc/rfc5646.html) and IANA language subtag registry (https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry) describe and list the language and region codes that make up the tags for…
Dave Cherkassky
  • 353
  • 4
  • 16
1
vote
1 answer

Taiwanese language and country codes

I'm a bit uncertain between the two variations below: zh-cht and zh-tw - it's for a site in traditional Chinese, mostly in Taiwan, but presence in Maccao and Hong Kong. So zh-cht and zh-tw seem to represent the same language. Possibly their are…
Rogelio
  • 910
  • 5
  • 14
1
vote
2 answers

Is there a list of BCP 47 language codes in R?

I'm running the fantastic pandoc from within an R package, relying on the LaTeX babel package for some typesetting niceties. Pandoc expects a lang argument as a BCP 47 code (e.g. en-US), but babel expects its own language codes (e.g.…
maxheld
  • 3,963
  • 2
  • 32
  • 51
1
vote
1 answer

AAPT ERROR: Invalid BCP 47 tag in directory name b+sr+latn_values

I am trying to run a command via aapt to test out the functionality. ./aapt package -f --no-crunch -M /home/username/AndroidStudioProjects/ProjectName/androidTest/src/main/AndroidManifest.xml -I…
jgm
  • 1,230
  • 1
  • 19
  • 39
1
vote
1 answer

Do I need hreflang x-default and can I've multiple hreflang point to same URL?

Based on the Google info about hreflang, I came up with this but I've the en and default point to same URL instead of having another en/. Will that be fine? I don't want to create another folder as it require additional maintenance. Basically, the…
sparkmix
  • 2,157
  • 3
  • 25
  • 33
0
votes
1 answer

Get exact language object from display name

Using langcodes package, how do I obtain the exact language object from the display name? For example, langcodes.find("English (United Kingdom)") returns Language.make(language='en') instead of returning Language.make(language='en', territory='GB')…
Anm
  • 447
  • 4
  • 15
0
votes
0 answers

How can I get the name of a language in any other language, based on IETF language tag

I'm looking for a way to get the name of a language in any other language, based on a IETF language tag. For example: I have a list of IETF language tags ('en', 'fr', 'nl', 'de' , ...) and I want them mapped to the language display name of my…
Renaat De Muynck
  • 3,267
  • 1
  • 22
  • 18
0
votes
0 answers

How to normalize semantically same language tags? [cldr]

I am currently browsing the cldr-common-42 database and I find the use of language tags a bit confusing. For example, the tag ar-EG is used for translations in Egyptian Arabic. However, when looking for what "Egyptian Arabic" is in other languages,…
0
votes
0 answers

BCP 47 language tag for Gaelic with overdot

In traditional orthography, Gaelic uses the overdot with certain consonants, instead of appending an "h". For example, "ḃ" is equivalent to "bh". What is the BCP 47 language tag for Gaelic with overdot?
jochen
  • 3,728
  • 2
  • 39
  • 49
0
votes
1 answer

converting TrueType Macintosh Language Codes to BCP 47 language tags

Truetype fonts use "Macintosh Language Codes" to describe the language of localised strings in the "name" table. A list of language codes can be found in in the TrueType spec. I need to convert these language codes to BCP 47 language tags. Is…
jochen
  • 3,728
  • 2
  • 39
  • 49
0
votes
0 answers

How to validate xml:lang ATTLIST inside XML with DTD?

Many articles on the internet (like this one) suggest using xml:lang or some custom attribute to encode meta-information about language inside XML tags. They mention that these codes have to comply with BCP47 standard. Let's see what would happen if…
soshial
  • 5,906
  • 6
  • 32
  • 40
1
2