0

I'm trying to write a simple XHTML to Simple Docbook translator (the input XHTML is a limited subset so it should be doable).

I have this:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" encoding="UTF-8" indent="yes" omit-xml-declaration="yes" standalone="no"/>
    <!--
    <xsl:strip-space elements="*"/>
    -->

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()" />
        </xsl:copy>
    </xsl:template>

    <!-- skip implicit tags-->
    <xsl:template match="/html/body"><xsl:apply-templates/></xsl:template>
    <xsl:template match="/html"><xsl:apply-templates/></xsl:template>

    <!-- paragraphs to sections converter -->
    <xsl:template match="h2">
        <xsl:variable name="title" select="generate-id(.)"/>
        <section>
            <title><xsl:apply-templates select="text()"/></title>
            <xsl:for-each select="following-sibling::*[generate-id(preceding-sibling::h2[1]) = $title and not(self::h2)]">
                <xsl:apply-templates/>
            </xsl:for-each>
        </section>
    </xsl:template>

    <xsl:template match="p">
        <para><xsl:apply-templates select="*|text()"/></para>
    </xsl:template>
    <xsl:template match="p[preceding-sibling::h2]"/>
    <xsl:template match="ul">
        <itemizedlist><xsl:apply-templates select="li"/></itemizedlist>
    </xsl:template>
    <xsl:template match="ul[preceding-sibling::h2]"/>
    <xsl:template match="ol">
        <orderedlist><xsl:apply-templates select="li"/></orderedlist>
    </xsl:template>
    <xsl:template match="ol[preceding-sibling::h2]"/>
    <xsl:template match="li">
        <listitem><para><xsl:apply-templates select="*|text()"/></para></listitem>
    </xsl:template>
</xsl:stylesheet>

For this input

<html>
<body>
<p>First paragraph</p>
<p>Second paragraph</p>
<h2>First title</h2>
<p>First paragraph</p>
<p>Second paragraph</p>
<p>Third paragraph</p>
<h2>Second title</h2>
<p>First paragraph</p>
<ul>
    <li>A list item</li>
    <li>Another list item</li>
</ul>
<p>Second paragraph</p>
</body>
</html>

I expect this output

<para>First paragraph</para>
<para>Second paragraph</para>
<section>
    <title>First title</title>
    <para>First paragraph</para>
    <para>Second paragraph</para>
    <para>Third paragraph</para>
</section>
<section>
    <title>Second title</title>
    <para>First paragraph</para>
    <itemizedlist>
        <listitem>A list item</listitem>
        <listitem>Another list item</listitem>
    </itemizedlist>
    <para>Second paragraph</para>
</section>

But I get

<para>First paragraph</para>
<para>Second paragraph</para>
<section><title>First title</title>First paragraphSecond paragraphThird paragraph</section>



<section><title>Second title</title>First paragraph
    <listitem><para>A list item</para></listitem>
    <listitem><para>Another list item</para></listitem>
Second paragraph</section>

For some reason, the template for my paragraphs and lists is not being applied. I'm guessing because the templates matching are the empty ones, but I need those to prevent duplicate tags outside section.

How can I make this work? TIA.

1 Answers1

1

Use

        <xsl:for-each select="following-sibling::*[generate-id(preceding-sibling::h2[1]) = $title and not(self::h2)]">
            <xsl:apply-templates select="."/>
        </xsl:for-each>

or simply

        <xsl:apply-templates select="following-sibling::*[generate-id(preceding-sibling::h2[1]) = $title and not(self::h2)]"/>

to process those elements you want to wrap into a section. But there will be a collision with your other templates so perhaps using a mode helps for the processing:

<xsl:template match="p" mode="wrapped">
    <para><xsl:apply-templates select="*|text()"/></para>
</xsl:template>
<xsl:template match="p[preceding-sibling::h2]"/>
<xsl:template match="ul" mode="wrapped">
    <itemizedlist><xsl:apply-templates select="li"/></itemizedlist>
</xsl:template>
<xsl:template match="ul[preceding-sibling::h2]"/>
<xsl:template match="ol" mode="wrapped">
    <orderedlist><xsl:apply-templates select="li"/></orderedlist>
</xsl:template>
<xsl:template match="ol[preceding-sibling::h2]"/>
<xsl:template match="li" mode="wrapped">
    <listitem><para><xsl:apply-templates select="*|text()"/></para></listitem>
</xsl:template>
Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • @DaliborKarlović, that is due to your other templates, I think a mode can help separating the two processing approaches (blocking items or transforming them). – Martin Honnen Apr 23 '20 at 09:24
  • I have solved it in the meantime using modes, yes, but it seems weird because I need to repeat all the non-empty templates to be with or without the mode. I'm not an expert of the use of it so might be doing something wrong. – Dalibor Karlović Apr 23 '20 at 09:37
  • 1
    @DaliborKarlović, using XSLT 2 or 3 with `xsl:for-each-group select="*" group-starting-with="h2"` you would have it easier. With the XSLT 1 approach you might want to try to filter out the elements you need as top level children in an apply-templates of the `body` e.g. ``, then you might not need the blocking/empty templates, could avoid introducing the mode and just keep the templates to transform e.g. `p` to `para`. – Martin Honnen Apr 23 '20 at 09:42
  • I can't use XSLT 2.0 since this is libxml based sadly. Thank you very much for the help and your confirmation I'm heading the right way. – Dalibor Karlović Apr 24 '20 at 10:59
  • You were right BTW, using XSLT 1.0 with this seems to be impossible with nested groups (with deeper headings creating their own subgroups), I've switched to XSLT 2.0 but it also isn't as simple as I anticipated: https://stackoverflow.com/q/61525643/672885 – Dalibor Karlović Apr 30 '20 at 14:26