1

I have a xml document and I am trying to get distinct leaf nodes path from root's child.

XML:

<?xml version="1.0" encoding="utf-8" ?>
<root>
    <class>
        <city>Test Data</city>
        <activity_version_id>Test Data</activity_version_id>
        <event_id>Test Data</event_id>
    </class>
    <class>
        <city>Test Data</city>
        <activity_version_id>Test Data</activity_version_id>
        <event_id>Test Data</event_id>
    </class>
</root>

XSL:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:output method="text" indent="no" />

    <xsl:template match="*[not(*)]">
        <xsl:for-each select="ancestor-or-self::*">
          <xsl:if test="name(/*) != name(current())">
            <xsl:value-of select="name()"/>

            <xsl:if test="count(descendant::*) != 0">
                <xsl:value-of select="concat('.','')"/>
            </xsl:if>
          </xsl:if>
        </xsl:for-each>
        <xsl:text>&#44;&#xA;</xsl:text>
        <xsl:apply-templates select="*"/>
    </xsl:template>

    <xsl:template match="*">
        <xsl:apply-templates select="*"/>
    </xsl:template>

</xsl:stylesheet>

Actual:

class.city,
class.activity_version_id,
class.event_id,
class.city,
class.activity_version_id
class.event_id

But I want to get only distinct node paths like this i.e., distinct node path

class.city
class.activity_version_id
class.event_id

The XSLT processor is Apache Software Foundation.

Please help. Thanks in advance.

Abhishekh Gupta
  • 6,206
  • 4
  • 18
  • 46

2 Answers2

3

SAXON 9.3.0.5 from Saxonica

That's good: it means you can use XSLT 2.0. Try:

XSLT 2.0

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="utf-8"/>

<xsl:variable name="paths">
    <xsl:apply-templates select="/*"/>
</xsl:variable>

<xsl:template match="/">
    <xsl:value-of select="distinct-values($paths/path)" separator="&#10;"/>
</xsl:template>

<xsl:template match="*[not(*)]">
    <path>
        <xsl:value-of select="ancestor-or-self::*/name()" separator="."/>
    </path>
</xsl:template>

</xsl:stylesheet>

Edit:

I got a problem. I have one other server whose XSLT processor is Apache Software Foundation and I am not able to transform it.

For Apache Xalan, try:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
xmlns:set="http://exslt.org/sets"
extension-element-prefixes="exsl set">
<xsl:output method="text" encoding="utf-8"/>

<xsl:variable name="paths">
    <xsl:apply-templates select="/*"/>
</xsl:variable>

<xsl:template match="/">
    <xsl:for-each select="set:distinct(exsl:node-set($paths)/path)">
        <xsl:value-of select="."/>
        <xsl:if test="position()!=last()">
            <xsl:text>&#10;</xsl:text>
        </xsl:if>
    </xsl:for-each>
</xsl:template>

<xsl:template match="*[not(*)]">
    <path>
    <xsl:for-each select="ancestor-or-self::*">
        <xsl:value-of select="name()"/>
        <xsl:if test="position()!=last()">
            <xsl:text>.</xsl:text>
        </xsl:if>
    </xsl:for-each>
    </path>
</xsl:template>

</xsl:stylesheet>
michael.hor257k
  • 113,275
  • 6
  • 33
  • 51
  • 1
    nice solution! just two additions to format output as @Beginner wants to: `separator=", "` (comma added) and `` to avoid output of the root element – leu Mar 27 '15 at 10:57
1

What about this XSLT 1.0 solution? No need for two different stylesheets!

  • No extension function used (no exslt set:distinct(), no exslt:node-set())
  • Completely portable between any two XSLT processors -- due to the above
  • Single-pass (no multi-pass processing, no intermediate results and no need to convert RTFs to temporary trees)
  • No explicit conditional XSLT instructions and no <xsl:for-each>
  • Adjustable to a maximum depth -- possibly a depth of 30 will work in 99.999% of the cases
  • Using keys (Muenchian grouping) and thus very fast
<xsl:stylesheet version="1.0"  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="text"/>

 <xsl:key name="kNodeByPath" match="*[not(*)]" 
    use="concat(name(), '/', name(..), '/', name(../..), '/', name(../../..), 
                '/', name(../../../..), '/', name(../../../../..))"/>

  <xsl:template match=
  "*[not(*)][generate-id() 
            = generate-id(key('kNodeByPath',  
                               concat(name(), '/', name(..), '/', name(../..),  
                                      '/', name(../../..), '/', name(../../../..), 
                                      '/', name(../../../../..)))[1])
            ]">
    <xsl:apply-templates select="ancestor::*[parent::*]" mode="path"/>
    <xsl:value-of select="name()"/>
    <xsl:text>&#xA;</xsl:text>
  </xsl:template>

  <xsl:template match="*" mode="path">
    <xsl:value-of select="concat(name(), '.')"/>
  </xsl:template>
  <xsl:template match="text()"/>
</xsl:stylesheet>

When this transformation is applied on the provided source XML document:

<root>
    <class>
        <city>Test Data</city>
        <activity_version_id>Test Data</activity_version_id>
        <event_id>Test Data</event_id>
    </class>
    <class>
        <city>Test Data</city>
        <activity_version_id>Test Data</activity_version_id>
        <event_id>Test Data</event_id>
    </class>
</root>

The wanted, correct result is produced:

class.city
class.activity_version_id
class.event_id
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • "*No extension function used*" You're saying it like that's a good thing... Extension functions are there to be used by supporting processors. Refusing to take advantage of them in the name of some abstract ideal of "portability" is IMHO not the optimal strategy. -- **If** I were asked to make this generic for any XSLT 1.0 processor, I would still rely on the `exsl:node-set()` function, rather than assume there's a limit to the depth of the source document. – michael.hor257k Mar 29 '15 at 06:44
  • @michael.hor257k, If someone is already bound to a particular vendor/set-of-tools, portability wouldn't matter to them. On the other side, there are developers and organizations, whose goal is maximum portability and widening of the applicability of their products. Nowadays using XSLT 1.0 is like driving a car of the 20-ies of the 20th century, but if some people still do use XSLT 1.0, they might want their transformations to run with MSXML 3, 4, 6 and with other XSLT 1.0 processors. Writing two or more transformations simply isn't affordable. And, BTW, MSXML doesn't support `exslt:node-set()` – Dimitre Novatchev Mar 29 '15 at 16:04
  • @michael.hor257k, As for the maximum depth, do try to find a real-world, non-ficticious XML document with depth greater than 40, and if you succeed, bear is on me whenever we meet :). My guess is that the maximum depth of existing documents is even quite smaller. Using a known maximum depth is an example of *sentinel programming* and the purpose of sentinel programming is to simplify an algorithm/code by ensuring weird things are not going to happen, and thus not including checks and treatment to such conditions. – Dimitre Novatchev Mar 29 '15 at 16:15
  • I am afraid I remain unconvinced. The only thing I would add is this: the difference between moving to a processor that cannot handle your existing stylesheet/s on the one hand, and encountering an XML document that doesn't conform to your assumptions on the other hand, is that the former is done consciously - and therefore the consequences can be controlled. – michael.hor257k Mar 29 '15 at 17:19
  • @michael.hor257k, You may remain "unconvinced", but for other people that are reading these comments: It *is* the right thing, when you know your limitations, *not* to worry about anything that falls outside of these limitations. Concentrate your energy on the *real* world if you want to be efficient. – Dimitre Novatchev Mar 29 '15 at 17:30
  • 1
    michael.hor257k, Dimitre Novatchev, Both solutions are nice. If I have a choice then I would go with a more generic processor independent solution. Thanks for your response. Cheers :) – Abhishekh Gupta Mar 30 '15 at 06:27
  • @Beginner, You are welcome! And yes, using a portable and less-dependent solution is the right approach in most situations. – Dimitre Novatchev Mar 30 '15 at 14:10