-1

HTML is a subset of SGML.

XHTML is a subset of XML.

Both use separate parsers.

Presuming an HTML document is correctly served as text/html and an XHTML application is correctly served as application/xhtml+xml is it possible to detect which parser is used to render the page and if so how?

  • I do understand exactly what I am asking. Please do not insist on asking why I want to do this.

  • I'd rather not receive answers suggesting that I do not use one language or the other. This is to avoid debate and help produce an answer I can use.

halfer
  • 19,824
  • 17
  • 99
  • 186
John
  • 1
  • 13
  • 98
  • 177
  • Those are probably not your only options - browsers are likely to have at least one other mode for tag soup. – Marcin Dec 16 '11 at 14:24
  • Also, why do you want to do this? It really would seem that just using well formed documents of one type would suffice. – Marcin Dec 16 '11 at 14:25
  • 1
    HTML is no longer an application of SGML. The current HTML standard defines different parsing rules... Quote: "*Also, since neither of the two authoring formats defined in this specification are applications of SGML, a validating SGML system cannot constitute a conformance checker either.*" from here: http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#conformance-checkers – Šime Vidas Dec 18 '11 at 00:47
  • Further proof: "*For compatibility with existing content and prior specifications, this specification describes two authoring formats: one based on XML (referred to as the XHTML syntax), and one using a **custom format inspired by SGML** (referred to as the HTML syntax)*" – Šime Vidas Dec 18 '11 at 00:51
  • 4
    "ask for a clarification *first* if you do not understand" I don't understand why you have a "JavaScript:" title prefix when we have a [javascript] tag. – BoltClock Apr 17 '12 at 12:24
  • @BoltClock'saUnicorn: I don't understand why you didn't [edit] to remove the tag from the title. –  Apr 17 '12 at 13:50
  • 1.) When I search I use the language name (e.g. javascript detect xyz) which makes it easier for people to find. 2.) Three up-votes in less than an hour when this question has had less then a hundred views in several months is highly suspicious. – John Apr 17 '12 at 14:41

1 Answers1

7

[This is a replacement for my original answer. My original idea was to exploit differences in the behaviour of innerHTML. Although it worked fine in IE9, Firefox and Chrome, it turned out that it failed in Opera, which appears to use an HTML parser for innerHTML even for pages served as application/xhtml+xml]


There's not too many ways to tell XML documents apart from HTML documents. One way however, is to exploit the case handling differences between HTML and XML.

In particular, the behaviour of Element.tagName differs. In an HTML parsed document, the element name will be coerced to upper case for tagName whereas in an XML parsed document it won't be. So we can test document.createElement("div").tagName == "DIV" which will give a different result depending on how the document was parsed.

See this test case:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
    <head>      
        <title>Test Case</title>
        <script>
            window.onload = function() {
              document.getElementById("result")
                .appendChild(document.createTextNode(
                  (document.createElement("div").tagName == "DIV") 
                    ? "HTML parser" : "XML parser"));
            }
        </script>
    </head>
    <body>
        <p id="result"></p>
    </body>
</html>

See it in action:

Alohci
  • 78,296
  • 16
  • 112
  • 156
  • Looks like it's on-topic though haven't even had a chance to get online until just now; will look in to this tomorrow and let you know how it works, thanks! – John Dec 18 '11 at 02:52
  • Excellent! I tested it out and appreciate your recant of innerHTML (of which I absolutely dislike since it does not work properly with the DOM). Accepted as answer and thumbed it up, thank you! – John Dec 19 '11 at 03:24