3

In HTML 5 specification the parser and the specification state that the element name can be everything starting with a letter and followed by alpha-numeric characters.

Now the question is what happens if I introduce additional elements not part of the specification but valid in terms of compliance to the specified syntax.

What do all those browsers do when they encounter elements with custom yet unknown name? Does those elements got treaten like any element or are they left out, stripped out or replaced?

How do for instance do HTML5 editor behave?

Is there anything in the specifications I have overlooked regarding valid element tag names?

[Update]

The specification was missleading here since it states the alpha-numerical character of the HTML element names. While reading a HTML 5 Specification, I missconcluded that this is true for all element names.

That is appearently wrong. In the parser section it states that a element name must only start with an ASCII letter and after that letter everthing except:

"tab" (U+0009)
"LF" (U+000A)
"FF" (U+000C)
U+0020 SPACE
"/" (U+002F)
">" (U+003E)
U+0000 NULL
EOF

Beside those mentioned characters which require special treatment involving errors or ending the tag name all other possible characters seam to be allowed.

Anything else
--> Append the current input character to the current tag token's tag name.

From my field test also additional uni-code letters are allowed for the first letter by several parsers (at least they are graceful with those).

[/Update]

Martin Kersten
  • 5,127
  • 8
  • 46
  • 77
  • 1
    Since there's no necessity to not use div tags with class names, I can't see why anyone would create new tags only to then have to concern themselves with what older browsers might or might not do. If you really want to add that trouble to your development to save a few keystrokes, read this http://www.sitepoint.com/5-reasons-why-you-can-use-html5-today/ – Popnoodles Dec 01 '13 at 17:22
  • I am designing a template extension that should fit as good as possible into the HTML5 specification. Therefore working with those templates will include HTML editors and Browsers etc. Therefore I am interested how the specification behaves and what are alternatives. Sadly the name space option is gone. – Martin Kersten Dec 01 '13 at 20:35
  • HTML5 = HTML, it's a buzz-word not an architecture. Not sure what exactly you mean by template extension but there are several templating systems that are well-written and useful already, Twig, Mustache for example. – Popnoodles Dec 01 '13 at 20:39
  • The HTML template relates to a component oriented approach where you draw elements from a library and thus one must know what kind of names are allowed and how it is handled by editors and browsers alike to allow a better template editing and preview workflow. – Martin Kersten Dec 02 '13 at 08:42
  • Note that carriage return characters are removed from the input stream before they reach the tokeniser, so are not covered in by the rules for the tokeniser. See http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#preprocessing-the-input-stream – Alohci Dec 02 '13 at 10:10
  • Its an excerpt of the spec's version I used. – Martin Kersten Dec 02 '13 at 12:11

1 Answers1

3

In HTML 5 specification the parser and the specification state that the element name can be everything starting with a letter and followed by alpha-numeric characters.

Incorrect. The specification states that the element name must be one of the names explicitly listed in that document, or in another applicable specification. These include but are not limited to SVG and MathML.

The specification also includes a processing specification for consumers of HTML, such as browsers. This doesn't describe what's "allowed", it describes what those consumers should do with each character of the document regardless of whether it contains things that are allowed or not allowed.

Now the question is what happens if I introduce additional elements not part of the specification but valid in terms of compliance to the specified syntax.

The above rules are followed. The "specified syntax" is irrelevant. The specification describes what the consumer should do for any input stream of characters.

What do all those browsers do when they encounter elements with custom yet unknown name? Does those elements got treated like any element or are they left out, stripped out or replaced?

They are treated as elements in the http://www.w3.org/1999/xhtml namespace which implement the HTMLUnknownElement interface.

How for instance do HTML5 editor behave?

If they are HTML5 compliant they will behave the same way when reading in the HTML.

Is there anything in the specifications I have overlooked regarding valid element tag names?

See the first paragraph above. Also the Custom Elements spec which makes any element name starting with an ASCII letter and containing a hyphen to be considered valid. It is unclear whether that specification is currently an "HTML5 applicable specification" but if not, it will very probably be one soon.

Alohci
  • 78,296
  • 16
  • 112
  • 156
  • As long as I understood the Html5 Specification there is no Hyphen ('-') possible within an element identifier. – Martin Kersten Dec 01 '13 at 20:36
  • 1
    @MartinKersten - That's not correct. Hyphens are possible. They just can't be the first character of the element name. It's easy to test. Just create an HTML file with an element containing a hyphen, load it into a browser and use a DOM inspector to see the result. Or you can use the [Live DOM Viewer](http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C!DOCTYPE%20html%3E%0A%3CMartin-Kersten%3E%0ATest%20message%0A%3C%2FMartin-Kersten%3E) – Alohci Dec 01 '13 at 21:02
  • I checked the parser states and the tag name state and it stats: Anything else: Append the current input character to the current tag token's tag name. – Martin Kersten Dec 02 '13 at 08:13
  • From what I understand, these days the "proper" way to create custom HTML elements is by using the `CustomElementRegistry` object, and the element name MUST contain a hyphen. https://developer.mozilla.org/en-US/docs/Web/Web_Components/Using_custom_elements – Gavin Nov 27 '19 at 06:25