Working on a simple HTML to Simple Docbook converter (HTML is using a subset of allowed markup so should be doable), started with XSLT 1.0 and it did sort of work, but it started getting quite complex with nesting, so got a suggestion to switch to XSLT 2.0 which has group-starting-with
.
Input is:
<html>
<body>
<p>P 1</p>
<p>P 2</p>
<h2>T 1</h2>
<p>T 1#P 1</p>
<h3>T 1.1</h3>
<p>T 1.1#P 1</p>
<h2>T 2</h2>
<p>T 2#P 1</p>
<h3>T 2.1</h3>
<p>T 2.1#P 1</p>
<ul>
<li>T 2.1#UL 1</li>
</ul>
<p>T 2.1#P 2</p>
<h3>T 2.2</h3>
<p>T 2.2#P 1</p>
<h2>T 3</h2>
<p>T 3#P 1</p>
</body>
</html>
And the current stylesheet is:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="UTF-8" indent="yes" omit-xml-declaration="yes"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()" />
</xsl:copy>
</xsl:template>
<xsl:template match="/html"><xsl:apply-templates/></xsl:template>
<xsl:template match="body">
<xsl:for-each-group select="*" group-starting-with="h2">
<xsl:choose>
<xsl:when test="current-grouping-key()">
<section title="{self::h2}">
<xsl:for-each-group select="current-group()" group-starting-with="h3">
<section title="{self::h3}">
<xsl:apply-templates select="."/>
</section>
</xsl:for-each-group>
</section>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:template>
<xsl:template match="h2">
<title><xsl:apply-templates select="*|text()"/></title>
</xsl:template>
<xsl:template match="h3">
<title><xsl:apply-templates select="*|text()"/></title>
</xsl:template>
<xsl:template match="p">
<para><xsl:apply-templates select="*|text()"/></para>
</xsl:template>
<xsl:template match="ul">
<itemizedlist><xsl:apply-templates select="li"/></itemizedlist>
</xsl:template>
<xsl:template match="ol">
<orderedlist><xsl:apply-templates select="li"/></orderedlist>
</xsl:template>
<xsl:template match="li">
<listitem><para><xsl:apply-templates select="*|text()"/></para></listitem>
</xsl:template>
</xsl:stylesheet>
Expected output is (note the "orphaned paragraphs" on top):
<para>P 1</para>
<para>P 2</para>
<section>
<title>T 1</title>
<para>T 1#P 1</para>
<section>
<title>T 1.1</title>
<para>T 1.1#P 1</para>
</section>
</section>
<section>
<title>T 2</title>
<para>T 2#P 1</para>
<section>
<title>T 2.1</title>
<para>T 2.1#P 1</para>
<itemizedlist>
<listitem><para>T 2.1#UL 1</para></listitem>
</itemizedlist>
<para>T 2.1#P 2</para>
</section>
<section>
<h3>T 2.2</h3>
<p>T 2.2#P 1</p>
</section>
</section>
<section>
<title>T 3</title>
<para>T 3#P 1</para>
</section>
But I'm not exactly sure how the nesting here is supposed to work. I did find some similar examples, but it's much more simplistic than what I require here.