118

What is the difference between <html lang="en"> and <html lang="en-US">? What other values can follow the dash?

According to w3.org "Any two-letter subcode is understood to be a [ISO3166] country code." so does that mean any value listed under the alpha-2 code is an accepted value?

Oded
  • 489,969
  • 99
  • 883
  • 1,009
Celeritas
  • 14,489
  • 36
  • 113
  • 194

6 Answers6

135

<html lang="en">
<html lang="en-US">

The first lang tag only specifies a language code. The second specifies a language code, followed by a country code.

What other values can follow the dash? According to w3.org "Any two-letter subcode is understood to be a [ISO3166] country code." so does that mean any value listed under the alpha-2 code is an accepted value?

Yes, however the value may or may not have any real meaning.

<html lang="en-US"> essentially means "this page is in the US style of English." In a similar way, <html lang="en-GB"> would mean "this page is in the United Kingdom style of English."

If you really wanted to specify an invalid combination, you could. It wouldn't mean much, but <html lang="en-ES"> is valid according to the specification, as I understand it. However, that language/country combination won't do much since English isn't commonly spoken in Spain.

I mean does this somehow further help the browser to display the page?

It doesn't help the browser to display the page, but it is useful for search engines, screen readers, and other things that might read and try to interpret the page, besides human beings.

Jeremy Wiggins
  • 7,239
  • 6
  • 41
  • 56
  • 37
    FWIW, the official languages of Uganda are actually English and Swahili. – Muhammad Alkarouri May 01 '13 at 18:32
  • 43
    Ha, good point. How American of me. :( I updated the example to Spain, and did a little legwork this time to make sure English isn't an official language there, too. Thanks for the tip. – Jeremy Wiggins May 01 '13 at 19:07
  • @JeremyWiggins, about your last 2 lines in your answer, starting with "it doesn't help the browser....". What if the website is an international (internationalized), would setting the language tag still be needed? – Yustme May 19 '14 at 12:46
  • 5
    Regarding last two lines — if page uses hyphenation from CSS (`hyphens: auto`), then `lang` attribute is required to allow browser to select proper set of rules. – RobertT Oct 30 '14 at 02:42
  • 1
    Not only are langauge settings for search engines or screen readers are helped with a proper language settings there is also a typographical effect. For instance simple quotes are only properly interpreted with the correct language settings diferenting between de-DE, de-CH, fr and fr-CH for instance. – theking2 Jan 11 '18 at 09:03
8

You can use any country code, yes, but that doesn't mean a browser or other software will recognize it or do anything differently because of it. For example, a screen reader might deal with "en-US" and "en-GB" the same if they only support an American accent in English. Another piece of software that has two distinct voices, though, could adjust according to the country code.

woz
  • 10,888
  • 3
  • 34
  • 64
8

This should help : http://www.w3.org/International/articles/language-tags/

The golden rule when creating language tags is to keep the tag as short as possible. Avoid region, script or other subtags except where they add useful distinguishing information. For instance, use ja for Japanese and not ja-JP, unless there is a particular reason that you need to say that this is Japanese as spoken in Japan, rather than elsewhere.

The list below shows the various types of subtag that are available. We will work our way through these and how they are used in the sections that follow.

language-extlang-script-region-variant-extension-privateuse

Community
  • 1
  • 1
Alfred DSouza
  • 328
  • 2
  • 12
  • 1
    Some software applications default to US spelling and localization when generic English options are chosen, eg Windows does this for the English language pack. https://technet.microsoft.com/en-us/library/cc766191(v=ws.10).aspx Windows (unhelpfully) has one only language pack for some countries which speak multiple languages, like the Netherlands (Dutch, not French) yet four for Spain (Catalan, Galician, Basque, Spanish). Belgium gets zero, possibly because the multiple national languages are both majority languages in other countries. – Mousey Jul 23 '15 at 18:11
2

RFC 3066 gives the details of the allowed values (emphasis and links added):

All 2-letter subtags are interpreted as ISO 3166 alpha-2 country codes from [ISO 3166], or subsequently assigned by the ISO 3166 maintenance agency or governing standardization bodies, denoting the area to which this language variant relates.

I interpret that as meaning any valid (according to ISO 3166) 2-letter code is valid as a subtag. The RFC goes on to state:

Tags with second subtags of 3 to 8 letters may be registered with IANA, according to the rules in chapter 5 of this document.

By the way, that looks like a typo, since chapter 3 seems to relate to the the registration process, not chapter 5.

A quick search for the IANA registry reveals a very long list, of all the available language subtags. Here's one example from the list (which would be used as en-scouse):

Type: variant

Subtag: scouse

Description: Scouse

Added: 2006-09-18

Prefix: en

Comments: English Liverpudlian dialect known as 'Scouse'

There are all sorts of subtags available; a quick scroll has already revealed fr-1694acad (17th century French).


The usefulness of some of these (I would say the vast majority of these) tags, when it comes to documents designed for display in the browser, is limited. The W3C Internationalization specification simply states:

Browsers and other applications can use information about the language of content to deliver to users the most appropriate information, or to present information to users in the most appropriate way. The more content is tagged and tagged correctly, the more useful and pervasive such applications will become.

I'm struggling to find detailed information on how browsers behave when encountering different language tags, but they are most likely going to offer some benefit to those users who use a screen reader, which can use the tag to determine the language/dialect/accent in which to present the content.

Community
  • 1
  • 1
James Allardice
  • 164,175
  • 21
  • 332
  • 312
0

XML Schema requires that the xml namespace be declared and imported before using xml:lang (and other xml namespace values) RELAX NG predeclares the xml namespace, as in XML, so no additional declaration is needed.

-1

Well, the first question is easy. There are many ens (Englishes) but (mostly) only one US English. One would guess there are en-CN, en-GB, en-AU. Guess there might even be Austrian English but that's more yes you can than yes there is.

Bardi Harborow
  • 1,803
  • 1
  • 28
  • 41
Wes Miller
  • 2,191
  • 2
  • 38
  • 64