Use this tag for questions related to the handling of identifiers ("tags") for spoken and written languages, as handled in a programming context. Specifically, this tag is for identifiers which conform to the IETF's BCP 47 document "Tags for Identifying Languages".
Overview
The IETF's BCP 47 document is a "best current practices" document for the identification of written and spoken languages, through the creation of language tags.
The document:
"specifies a particular identifier mechanism (the language tag) and a registration function for values to be used to form tags."
Basic Examples
fr
- Frenchfr-CA
- Canadian Frenches-419
- Spanish as used in Latin America and the Carribbeanzh-Hant
- Chinese written using Traditional Han script
Structure Overview
BCP 47 language tags have a flexible structure which can contain the following subtags, separated by dashes:
language-extlang-script-region-variant-extension-private
The language
subtag is mandatory and must come first. Its values are taken from ISO 639 language codes.
The extlang
(extended language) subtag can be used to provide more specificity - for example, cmn
for Mandarin in zh-cmn
(Mandarin Chinese).
The script
subtag can be used to make a distinction between different written formats of a language (for example, Hant
vs. Hans
for traditional vs. simplified Chinese).
The region
subtag can be a country code (CA
in fr-CA
) or a UN M.49 region code (419
in es-419
).
The variant
subtag can provide a finer-grained definition for dialects and scripts. This is not typically needed in most common usages.
The extension
and private
subtags can be used for further customized language data.
Resources
- The specification document
- W3C Guide to Language Tags
- Unicode CLDR guide to Picking the Right Language Identifier
- Language tag lookup and validation service