RFC 3066 gives the details of the allowed values (emphasis and links added):
All 2-letter subtags are interpreted as ISO 3166 alpha-2 country codes
from [ISO 3166], or subsequently assigned by the ISO 3166 maintenance
agency or governing standardization bodies, denoting the area to which
this language variant relates.
I interpret that as meaning any valid (according to ISO 3166) 2-letter code is valid as a subtag. The RFC goes on to state:
Tags with second subtags of 3 to 8 letters may be registered with
IANA, according to the rules in chapter 5 of this document.
By the way, that looks like a typo, since chapter 3 seems to relate to the the registration process, not chapter 5.
A quick search for the IANA registry reveals a very long list, of all the available language subtags. Here's one example from the list (which would be used as en-scouse
):
Type: variant
Subtag: scouse
Description: Scouse
Added: 2006-09-18
Prefix: en
Comments: English Liverpudlian dialect known as 'Scouse'
There are all sorts of subtags available; a quick scroll has already revealed fr-1694acad
(17th century French).
The usefulness of some of these (I would say the vast majority of these) tags, when it comes to documents designed for display in the browser, is limited. The W3C Internationalization specification simply states:
Browsers and other applications can use information about the language
of content to deliver to users the most appropriate information, or to
present information to users in the most appropriate way. The more
content is tagged and tagged correctly, the more useful and pervasive
such applications will become.
I'm struggling to find detailed information on how browsers behave when encountering different language tags, but they are most likely going to offer some benefit to those users who use a screen reader, which can use the tag to determine the language/dialect/accent in which to present the content.