144

How would I cleanly set the doctype of a file to HTML5 <!DOCTYPE html> via XSLT (in this case with collective.xdv)

The following, which is the best my Google foo has been able to find:

<xsl:output
    method="html"
    doctype-public="XSLT-compat"
    omit-xml-declaration="yes"
    encoding="UTF-8"
    indent="yes" />

produces:

<!DOCTYPE html PUBLIC "XSLT-compat" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Paul Sweatte
  • 24,148
  • 7
  • 127
  • 265
Jon Hadley
  • 5,196
  • 8
  • 41
  • 65
  • 7
    Incidentally, using PUBLIC "XSLT-compat" is out of date. The XSLT compatible HTML5 doctype is now . See http://dev.w3.org/html5/spec/syntax.html#doctype-legacy-string – Alohci Aug 02 '10 at 21:05
  • 1
    From the last Editor WD, it looks like almost any doctype is allowed: short ` `, legacy ` ` and obsoleted ("should not") HTML 4, HTML 4.01, XHTML 1.0 and XHTML 1.1 (all strict DTD when there is SYSTEM). –  Aug 02 '10 at 22:51
  • 1
    Please update some answer to HTML5 as (nowadays) W3C recommendation. – Peter Krauss Feb 09 '15 at 17:19

12 Answers12

158

I think this is currently only supported by writing the doctype out as text:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="html" encoding="utf-8" indent="yes" />

  <xsl:template match="/">
    <xsl:text disable-output-escaping='yes'>&lt;!DOCTYPE html&gt;</xsl:text>
    <html>
    </html>
  </xsl:template>

</xsl:stylesheet>

This will produce the following output:

<!DOCTYPE html>
<html>
</html>
C. Dragon 76
  • 9,882
  • 9
  • 34
  • 41
Dirk Vollmar
  • 172,527
  • 53
  • 255
  • 316
  • This is the only standar way. But, with MSXSL, there is a non standar way: use empty xsl:output/@doctype-public and xsl:output/@doctype-system. –  Aug 02 '10 at 15:18
  • 4
    `disable-output-escaping` was meant by Casey – yegor256 Aug 09 '11 at 04:54
  • This worked great once I removed both internal and public doc type attributes from the output method tag. Thanks! – greenland Dec 11 '15 at 17:37
  • This will work most of the time, but it is a hack, and it is unlikely (i.e. won't) work as expected if you are not serialising your result back to a text file on disk (e.g. if the result of the transform is being passed on to another process without serialisation). – Tom Hillman Oct 04 '16 at 12:28
  • If the doctype and opening html tag end up on the same line, then you can simply add a newline `<!DOCTYPE html>\n` (at least in Java's JAX, some 9 years later) – earcam Jun 12 '19 at 20:48
  • If the doctype and opening html tag end up on the same line and your JAX version do not support `\n`, you can use ` `: `<!DOCTYPE html> ` – geckoflume Feb 15 '23 at 14:15
  • This script produces me: "<!DOCTYPE html>" If I replace < with <, it produces an error... – xerostomus Jun 19 '23 at 06:01
66

To use the simple HTML doctype <!DOCTYPE html>, you have to use the disable-output-escaping feature: <xsl:text disable-output-escaping="yes">&lt;!DOCTYPE html&gt;</xsl:text>. However, disable-output-escaping is an optional feature in XSLT, so your XSLT engine or serialization pipeline might not support it.

For this reason, HTML5 provides an alternative doctype for compatibility with HTML5-unaware XSLT versions (i.e. all the currently existing versions of XSLT) and other systems that have the same problem. The alternative doctype is <!DOCTYPE html SYSTEM "about:legacy-compat">. To output this doctype, use the attribute doctype-system="about:legacy-compat" on the xsl:output element without using a doctype-public attribute at all.

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:output method="html" doctype-system="about:legacy-compat"/>
   ...
   <html>
   </html>
</xsl:stylesheet>
Ian Boyd
  • 246,734
  • 253
  • 869
  • 1,219
hsivonen
  • 7,908
  • 1
  • 30
  • 35
  • I appreciate this is probably the correct, standards driven way to accomplish what I want (I've upvoted it as such). But the former isn't supported (my processor falls over) and the latter still results in `"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"` in my doctype. As @Jirka Kosek suggested, I think my XSLT processor might be broken. – Jon Hadley Aug 04 '10 at 12:30
  • 1
    Deliverance (the XSLT processor I am using) mailing list discussion regarding this problem is here: http://www.coactivate.org/projects/deliverance/lists/deliverance-discussion/archive/2010/08/1280925406261 – Jon Hadley Aug 04 '10 at 12:39
  • 1
    The [w3c validator service](https://validator.w3.org/) issues a warning when the document starts with ` ` – Adrian W Jul 13 '18 at 16:05
30
<xsl:output
     method="html"
     doctype-system="about:legacy-compat"
     encoding="UTF-8"
     indent="yes" />

this outputs

<!DOCTYPE html SYSTEM "about:legacy-compat">

this is modified as my fix to http://ukchill.com/technology/generating-html5-using-xslt/

Jim Michaels
  • 669
  • 5
  • 9
  • 1
    The [w3c validator service](https://validator.w3.org/) issues a warning when the document starts with ` ` – Adrian W Jul 13 '18 at 16:06
  • 1
    @AdrianW The warning is *"Documents should not use about:legacy-compat, except if generated by legacy systems that can't output the standard doctype."*, which is exactly what is happening here with **xslt**. This system **is** a legacy system that **must** emit a `System ID`. The HTML spec makes it very clear that ` ` is the correct html5 doctype. – Ian Boyd Jan 13 '22 at 22:13
21

With Saxon 9.4 you can use:

<xsl:output method="html" version="5.0" encoding="UTF-8" indent="yes" />

This generates:

<!DOCTYPE HTML>
stephanme
  • 329
  • 2
  • 4
  • 2
    Unfortunately, it's specific to Saxon. On the otherhand, it is simply the most concise answer to the Q. I wonder if this works with the other XSLT 2.0 processors? – Paulb Jun 24 '14 at 12:20
  • This is now no longer specific just to Saxon but is also supported in the libxslt/xsltproc sources. See the details at the end of http://stackoverflow.com/questions/3387127/set-html5-doctype-with-xslt/42048575#42048575 – sideshowbarker Feb 07 '17 at 17:05
10

Use doctype-system instead of doctype-public

Jirka Kosek
  • 109
  • 2
10

You must use XHTML 1.0 Strict as the doctype if you want XHTML output consistent with HTML5, libxml2's xml serializer has a special output mode triggered by the XHTML 1.0 doctypes that ensures output is XHTML compatible, (e.g. <br /> rather than <br/>, <div></div> rather than <div/>). doctype-system="about:legacy-compat" does not trigger this compatibility mode

If you are happy with html output, then setting <xsl:output method="html"> should do the right thing. You can then set the doctype with <xsl:text disable-output-escaping="yes">&lt;!DOCTYPE html&gt;</xsl:text>, though this will need plumbing in at the appropriate place as XDV does not support this yet.

In fact it seems <xsl:output method="html"/> does not really help either - this will result in <br/> being output as <br></br>.

Russia Must Remove Putin
  • 374,368
  • 89
  • 403
  • 331
Laurence Rowe
  • 2,909
  • 17
  • 20
6

This variation of Jirka Kosek's advice, via Advanced XDV theming on Plone.org seems to work for me in collective.xdv.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output
      doctype-public="HTML"
      doctype-system=""/>
</xsl:stylesheet>
Community
  • 1
  • 1
Jon Hadley
  • 5,196
  • 8
  • 41
  • 65
  • 1
    Yes, but as I've commented in 0xA3 answer, empty @doctype-system or @doctype-public are not standar (also, it's against the spec!) –  Aug 02 '10 at 16:19
5

This is a comment, but I do not have enough karma points to put it in the correct place. Sigh.

I appreciate this is probably the correct, standards driven way to accomplish what I want (I've upvoted it as such). But the former isn't supported (my processor falls over) and the latter still results in "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" in my doctype. As @Jirka Kosek suggested, I think my XSLT processor might be broken.

No, your XSLT processor is not broken, it's just that XDV adds:

<xsl:output method="xml" indent="no" omit-xml-declaration="yes" media-type="text/html" encoding="utf-8" doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN" doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"/>

by default, so when you add a second <xsl:output doctype-system="about:legacy-compat"/> the previous doctype-public is not overwritten.

Note that XHTML 1.0 strict is listed as an obsolete permitted doctype string, so it is perfectly acceptable to use this doctype and still call it HTML5.

Laurence Rowe
  • 2,909
  • 17
  • 20
  • If your XSLT processor adds elements to your stylesheets or has some non-standards attribute default values, that would mean it's broken. –  Apr 21 '11 at 02:42
  • 6
    @Alejandro: XDV (now renamed diazo) is not an XSLT processor, it is a theme -> XSLT compiler. It is XDV which is adding the the default values into the compiled XSLT. I know this because I wrote it ;) – Laurence Rowe Apr 22 '11 at 10:52
3

Sorry to only provide links but this was discussed among the WHATWG group but it's been many months since I've dealt with it. Here Ian Hickson and some XML experts discuss this:
http://lists.w3.org/Archives/Public/public-html/2009Jan/0640.html
http://markmail.org/message/64aykbbsfzlbidzl
and here is the actual issue number:
http://www.w3.org/html/wg/tracker/issues/54
and here's this discussion
http://www.contentwithstyle.co.uk/content/xslt-and-html-5-problems

Rob
  • 14,746
  • 28
  • 47
  • 65
2

Use this tag

<xsl:output method="xml" doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" doctype-public="XSLT-compat" indent="yes"/>
1

that's what i use to generate a compatible html5 doctype (getting saxons html5 out, otherwise doing the legacy thing)

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns="http://www.w3.org/TR/REC-html40">

    <xsl:output
        method="html"
        version="5.0"
        doctype-system="about:legacy-compat"
        encoding="UTF-8"
        indent="yes" />
BananaAcid
  • 3,221
  • 35
  • 38
1

The following code will work as a standalone template if saved as html5.xml:

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="html5.xml"?>
<xsl:stylesheet version="1.0"
            xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/1999/xhtml"
            >
<xsl:output method="xml" encoding="utf-8" version="" indent="yes" standalone="no" media-type="text/html" omit-xml-declaration="no" doctype-system="about:legacy-compat" />

<xsl:template match="xsl:stylesheet">
  <xsl:apply-templates/>
</xsl:template>

<xsl:template match="/">
  <html>
    <head>
      <meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
    </head>
    <body>
      <xsl:text>hi</xsl:text>
    </body>
  </html>
</xsl:template>

</xsl:stylesheet>

References

Paul Sweatte
  • 24,148
  • 7
  • 127
  • 265