0

Many articles on the internet (like this one) suggest using xml:lang or some custom attribute to encode meta-information about language inside XML tags. They mention that these codes have to comply with BCP47 standard.

Let's see what would happen if I encode language attribute as articles suggest:

  1. Inside DTD: <!ATTLIST text xml:lang NMTOKEN #IMPLIED>
  2. Inside XML: <text xml:lang="YODU991Yklew-e-ijsw02ijwk">...</text>

What is the expected result?

DTD validator would check if YODU991Yklew-e-ijsw02ijwk code is a real BCP47 language code, if country and script exist and mark it red, if those codes that are incorrect. Exactly the same way as http://schneegans.de/ helps validating these codes (WRONG code vs. CORRECT code).

What happens instead?

Validator percieves this attribute only as some text and does not validate, if it as a real language code or some gibberish.

Community
  • 1
  • 1
soshial
  • 5,906
  • 6
  • 32
  • 40
  • 1
    Whether or not a language code conforms to BCP 47 is simply not something that's validated by a DTD. The only thing a generic XML + DTD validator would do, based on these declarations, is validate that the attribute contents conform to the `NMTOKEN` rules. A validator that actually respects the DTD would mark `YODUY:klew-e-ijswe%_fijwk` as invalid -- but because `%` is not an allowed character in XML names, not because it is not a language tag. A more complex rule to check the tag (for syntactical validity alone) could be encoded in an XML Schema rule using a regex. – Jeroen Mostert Jun 07 '19 at 14:09
  • 1
    If you want to go beyond syntactical validity and right into correctness (so that `ha-HA` is marked as incorrect because `HA` is not a valid region), you won't get around to writing (or somehow obtaining) a separate validator to run over the document after the usual well-formedness checks, as that sort of extended semantic analysis is beyond the capacities of both DTD and XML Schema (while keeping the markup reasonable, that is). – Jeroen Mostert Jun 07 '19 at 14:19
  • @JeroenMostert, thx for the valuable comments! If BCP47 validation is beyond DTD and XML Schema validation, then why is there any mention/recommendation of BCP47 usage? Maybe, it is possible to enforce BCP47 validation with using DTD ENTITY or NOTATION or with the help of linking to some external DTD schema? – soshial Jun 07 '19 at 15:30
  • 1
    Why wouldn't they mention what you should use? Just because neither DTD nor XML Schema actually validate the tags doesn't mean nobody cares about what's in the tags. And no, you can't enforce validation through the subset of DTD supported by XML -- at least not *practically*. The DTD would have to list all possible combinations of languages, regions, variants -- I don't see it. NOTATION would only help insofar as you can signal that it should be an unparsed external entity, but then that still skips any actual validation (since that would have to be external). – Jeroen Mostert Jun 08 '19 at 11:28

0 Answers0