0

I'm trying to write an XSL to tidy up a bit certain XML files (which are Maven's POM). What I want to do is to rearrange the order of certain top elements, remove one element and copy as-is all the rest. An example of the original XML is:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>net.sourceforge.ondex.apps</groupId>
    <name>Ondex</name>
    <version>0.6.0-SNAPSHOT</version>
    <artifactId>installer</artifactId>
    <packaging>pom</packaging>
    <description>NSIS based Installer</description>
    <parent>
        <artifactId>apps</artifactId>
        <groupId>net.sourceforge.ondex</groupId>
        <version>0.6.0-SNAPSHOT</version>
    </parent>
    <organization>
        <name>Ondex Project</name>
        <url>http://www.ondex.org</url>
    </organization>

    <build>
    ...
    </build>
  ...
</project>

This XML is almost working (with Saxon HE-9-7-06J):

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math" exclude-result-prefixes="xs math pom"
    xmlns="http://maven.apache.org/POM/4.0.0"
    xmlns:pom="http://maven.apache.org/POM/4.0.0"
    >
    <xsl:output method="xml" indent="yes" />

    <xsl:template match="/pom:project">
        <project>
            <xsl:copy-of select="@*" />
            <xsl:apply-templates select="pom:modelVersion" />
            <xsl:apply-templates select="pom:parent" />     
            <xsl:apply-templates select="pom:groupId" />
            <xsl:apply-templates select="pom:artifactId" />
            <xsl:apply-templates select="pom:name" />
            <xsl:apply-templates select="pom:description" />
            <xsl:apply-templates
                select="node() except (pom:modelVersion|pom:parent|pom:groupId|pom:artifactId|pom:name|pom:description|pom:version)" />
        </project>
    </xsl:template>

    <!-- And the usual identity transform for all other nodes --> 
    <xsl:template match="node()|@*">
        <xsl:copy><xsl:apply-templates select="node()|@*" /></xsl:copy>
    </xsl:template>

</xsl:stylesheet>

However, the output has unwanted blank lines added in place of the nodes that are moved (e.g., see the lines after description, where initially I had parent):

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
   <modelVersion>4.0.0</modelVersion>
   <parent>
            <artifactId>apps</artifactId>
            <groupId>net.sourceforge.ondex</groupId>
            <version>0.6.0-SNAPSHOT</version>
      </parent>
   <groupId>net.sourceforge.ondex.apps</groupId>
   <artifactId>installer</artifactId>
   <name>Ondex</name>
   <description>NSIS based Installer</description>





      <packaging>pom</packaging>


      <organization>
            <name>Ondex Project</name>
            <url>http://www.ondex.org</url>
      </organization>

      <build>
      ...
      </build>
  ...
</project>

What am I doing wrong? Note that I don't want to use xsl:strip-space, because I want to preserve spaces that are put in the original file for readability purposes.

zakmck
  • 2,715
  • 1
  • 37
  • 53
  • So how should the XSLT identify spaces to be deleted and keep those "that are put in the original file for readability purposes"? Your last `` does not try to exclude any white space text. – Martin Honnen Jan 25 '17 at 18:26
  • There aren't spaces to be deleted. The XSLT is adding blank lines in the positions from which the caught elements are moved (e.g., in place of parent). – zakmck Jan 25 '17 at 18:33
  • 2
    @zakmck I am afraid you are misinterpreting what happens here. The transformation is not adding any blank lines. It copies them from the input XML. You will find it difficult to treat the whitespace text node located between `` and `` differently than the one located between `` and ``. Perhaps you could distinguish them by counting how many linefeed characters they contain? – michael.hor257k Jan 25 '17 at 18:55
  • @MartinHonnen, I think you're right, I'll see what I can do by trying to match and ignore the white spaces between the tags I'm moving. Thank you. – zakmck Jan 25 '17 at 21:30

2 Answers2

0

Use * to select nodes instead of node():

<xsl:template match="/pom:project">
    <project>
        <xsl:copy-of select="@*" />
        <xsl:apply-templates select="pom:modelVersion" />
        <xsl:apply-templates select="pom:parent" />     
        <xsl:apply-templates select="pom:groupId" />
        <xsl:apply-templates select="pom:artifactId" />
        <xsl:apply-templates select="pom:name" />
        <xsl:apply-templates select="pom:description" />
        <xsl:apply-templates
            select="* except (pom:modelVersion|pom:parent|pom:groupId|pom:artifactId|pom:name|pom:description|pom:version)" />
    </project>
</xsl:template>

<!-- And the usual identity transform for all other nodes --> 
<xsl:template match="node()|@*">
    <xsl:copy><xsl:apply-templates select="node()|@*" /></xsl:copy>
</xsl:template>

Working XSLT 2.0 code here

Madeyedexter
  • 1,155
  • 2
  • 17
  • 33
  • Unfortunately this removes blank lines in the original file too. To explain it better: is moved to the correct position in the output, but I see blank lines in the position where it was initially located. Instead, any other blank line in the original XML is correctly reported in the output, which is what I want. – zakmck Jan 25 '17 at 18:31
  • In that case, I would suggest using `` to strip all spaces and adding new lines using `` wherever required for readability purpose. – Madeyedexter Jan 25 '17 at 18:49
  • Can't work, sorry. I want to report the spaces that the user has put in the original XML (and obviously I don't know where). The problem is XSLT is adding more of them. I think I have undersdand why: node() matches something (new line?) between in between the closure of an element and the opening of the next one, but I can't understand what and how to get rid of it. – zakmck Jan 25 '17 at 18:52
0

OK, after the answers and comments you kindly wrote hereby, I've realised what's going on and found a workaround:

As @michael.hor257k explains, the problem is the newline between matched elements (e.g., </parent> and <organization>) is matched by XSL as node and reported in the output alone, resulting in empty lines.

<xsl:strip-space> alone isn't enough, cause it removes these newlines together with manually inserted blank lines, which I want to keep.

But it is a good start: I preprocess the XML with:

sed -E s/'^\s*$'/'<white-line\/>'/ pom.xml  | sponge pom.xml

that is, all 'true' white lines are replaced by the tag <white-line />. So, now it's easy to add this to the XSL above in addition to <xsl:strip-space elements="*" />:

<xsl:template match="pom:white-line">
  <xsl:text>

  </xsl:text>
</xsl:template>

Possibly, you might also need to remove starting/trailing blank lines, in order to avoid that they're filled with custom XML outside the root element and thus causing an error.

Thanks for the help!

Community
  • 1
  • 1
zakmck
  • 2,715
  • 1
  • 37
  • 53