0

Working on a simple HTML to Simple Docbook converter (HTML is using a subset of allowed markup so should be doable), started with XSLT 1.0 and it did sort of work, but it started getting quite complex with nesting, so got a suggestion to switch to XSLT 2.0 which has group-starting-with.

Input is:

<html>
<body>
<p>P 1</p>
<p>P 2</p>

<h2>T 1</h2>
<p>T 1#P 1</p>
    <h3>T 1.1</h3>
    <p>T 1.1#P 1</p>

<h2>T 2</h2>
<p>T 2#P 1</p>
    <h3>T 2.1</h3>
    <p>T 2.1#P 1</p>
    <ul>
        <li>T 2.1#UL 1</li>
    </ul>
    <p>T 2.1#P 2</p>

    <h3>T 2.2</h3>
    <p>T 2.2#P 1</p>

<h2>T 3</h2>
<p>T 3#P 1</p>

</body>
</html>

And the current stylesheet is:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" encoding="UTF-8" indent="yes" omit-xml-declaration="yes"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()" />
        </xsl:copy>
    </xsl:template>

    <xsl:template match="/html"><xsl:apply-templates/></xsl:template>
    <xsl:template match="body">
        <xsl:for-each-group select="*" group-starting-with="h2">
            <xsl:choose>
                <xsl:when test="current-grouping-key()">
                    <section title="{self::h2}">
                        <xsl:for-each-group select="current-group()" group-starting-with="h3">
                            <section title="{self::h3}">
                                <xsl:apply-templates select="."/>
                            </section>
                        </xsl:for-each-group>
                    </section>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:apply-templates select="current-group()"/>
                </xsl:otherwise>
            </xsl:choose>
        </xsl:for-each-group>
    </xsl:template>

    <xsl:template match="h2">
        <title><xsl:apply-templates select="*|text()"/></title>
    </xsl:template>
    <xsl:template match="h3">
        <title><xsl:apply-templates select="*|text()"/></title>
    </xsl:template>
    <xsl:template match="p">
        <para><xsl:apply-templates select="*|text()"/></para>
    </xsl:template>
    <xsl:template match="ul">
        <itemizedlist><xsl:apply-templates select="li"/></itemizedlist>
    </xsl:template>
    <xsl:template match="ol">
        <orderedlist><xsl:apply-templates select="li"/></orderedlist>
    </xsl:template>
    <xsl:template match="li">
        <listitem><para><xsl:apply-templates select="*|text()"/></para></listitem>
    </xsl:template>
</xsl:stylesheet>

Expected output is (note the "orphaned paragraphs" on top):

<para>P 1</para>
<para>P 2</para>

<section>
    <title>T 1</title>
    <para>T 1#P 1</para>
    <section>
        <title>T 1.1</title>
        <para>T 1.1#P 1</para>
    </section>
</section>

<section>
    <title>T 2</title>
    <para>T 2#P 1</para>
    <section>
        <title>T 2.1</title>
        <para>T 2.1#P 1</para>
        <itemizedlist>
            <listitem><para>T 2.1#UL 1</para></listitem>
        </itemizedlist>
        <para>T 2.1#P 2</para>
    </section>
    <section>
        <h3>T 2.2</h3>
        <p>T 2.2#P 1</p>
    </section>
</section>
<section>
    <title>T 3</title>
    <para>T 3#P 1</para>
</section>

But I'm not exactly sure how the nesting here is supposed to work. I did find some similar examples, but it's much more simplistic than what I require here.

  • 1
    If you use `group-starting-with`, then `current-grouping-key()` is empty, you need to check `self::h1` respectively `self::*[local-name() = concat('h', $level)]` in a recursive function or template where you use a parameter `level` to increase the level/heading you look for. There are some examples on StackOverflow, like https://stackoverflow.com/a/14342764/252228. I think Michael Kay also has some paper online using a recursive template. – Martin Honnen Apr 30 '20 at 14:46
  • 1
    https://stackoverflow.com/a/31985976/252228 is another recursive answer, maybe closer to your input sample. – Martin Honnen Apr 30 '20 at 14:47
  • I always assumed I kind of know XSLT which I've used many times before, but these answers show me I don't really know anything. :) Thank you very much for your answers, both here and there before, your passion for XSLT is very well coupled with your obvious mastery of it. Would you mind copy/pasting the link in the actual answer and I'll accept it. Thanks very much. – Dalibor Karlović May 05 '20 at 09:06
  • I have marked your question as a duplicate as it seems the previous question/answer has helped you solve the problem. So in the spirit of the SO rules I think it is better marking your question as a duplicate instead of repeating a link to it in a new answer. – Martin Honnen May 05 '20 at 09:28
  • Even better, didn't know SO offers that option. – Dalibor Karlović May 05 '20 at 11:34

0 Answers0