1

i'm again. I've a new problem.

I like to strip/reduce an xml structure to only needed elements.

To explain the problem i built a simplyfied random structure.

<ROOT>
    <DATA>
        <ALLOC>
            <TYPE>Test</TYPE>
            <NAME>something text</NAME>
            <VALUE>4711</VALUE>
        </ALLOC>
        <ALLOC>
            <TYPE>Test</TYPE>
            <NAME>something text</NAME>
            <VALUE>4712</VALUE>
        </ALLOC>
        <ALLOC>
            <TYPE>Test</TYPE>
            <NAME>something text</NAME>
            <VALUE>4713</VALUE>
        </ALLOC>
    </DATA>
    <SOURCE>
        <CONNECTION>
            <TYPE>SQL</TYPE>
            <VALUE>jdbc</VALUE>
            <CSTRING>jdbc string</CSTRING>
        </CONNECTION>
        <CONNECTION>
            <TYPE>CSV</TYPE>
            <VALUE>CSV</VALUE>
            <CSTRING></CSTRING>
        </CONNECTION>
    </SOURCE>
</ROOT>

Requiered Elements are e.g.:

/ROOT[1]/DATA[1]/ALLOC[2]/VALUE[1]
/ROOT[1]/SOURCE[1]/CONNECTION[1]/CSTRING[1]

The requiered Elements Statements comes from java with xmlassert.equal > xmldiff

Now i have to strip the xml structure, to requiered elements, but keeping the xml structure (xpath) of elements.

The desired output is:

<ROOT>
    <DATA>
        <ALLOC>
            <VALUE>4712</VALUE>
        </ALLOC>
    </DATA>
    <SOURCE>
        <CONNECTION>
            <CSTRING>jdbc string</CSTRING>
        </CONNECTION>       
    </SOURCE>
</ROOT>

The real structure is huge (minimum 6x A4 Pages if you would print it), complex and has multilevels. The requested Elements are also dynamically.

I spent the last hours with reading threads in a lot of fourms, tries with a lot of amount of different xslt's and reading of more threads.

How can i do that?

Thank you so much in advance.

Hans.Olo
  • 67
  • 10

2 Answers2

1

How can i do that?

This is a short and simple XSLT 1.0 generic solution:

<xsl:stylesheet version="1.0"  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:param name="pExpressions">
      <e>/ROOT[1]/DATA[1]/ALLOC[2]/VALUE[1]</e>
      <e>/ROOT[1]/SOURCE[1]/CONNECTION[1]/CSTRING[1]</e>
    </xsl:param>        
    <xsl:variable name="vExpressions" 
                  select="document('')/*/xsl:param[@name='pExpressions']/*"/>

    <xsl:template match="*">
      <xsl:variable name="vPath">
        <xsl:apply-templates select="ancestor-or-self::*" mode="path"/>
      </xsl:variable>

      <xsl:copy-of select="self::*[$vExpressions[.=$vPath]]"/>

      <xsl:apply-templates select=
      "self::*[$vExpressions[not(.=$vPath) and starts-with(.,$vPath)]]" mode="process"/>
    </xsl:template>

    <xsl:template match="*" mode="path">
        <xsl:value-of select="concat('/',name())"/>
        <xsl:variable name="vnumPrecSiblings" select=
         "count(preceding-sibling::*[name()=name(current())])"/>
        <xsl:value-of select="concat('[', $vnumPrecSiblings +1, ']')"/>
     </xsl:template>

     <xsl:template match="*" mode="process">
       <xsl:copy>
         <xsl:apply-templates select="*"/>
       </xsl:copy>
     </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

<ROOT>
    <DATA>
        <ALLOC>
            <TYPE>Test</TYPE>
            <NAME>something text</NAME>
            <VALUE>4711</VALUE>
        </ALLOC>
        <ALLOC>
            <TYPE>Test</TYPE>
            <NAME>something text</NAME>
            <VALUE>4712</VALUE>
        </ALLOC>
        <ALLOC>
            <TYPE>Test</TYPE>
            <NAME>something text</NAME>
            <VALUE>4713</VALUE>
        </ALLOC>
    </DATA>
    <SOURCE>
        <CONNECTION>
            <TYPE>SQL</TYPE>
            <VALUE>jdbc</VALUE>
            <CSTRING>jdbc string</CSTRING>
        </CONNECTION>
        <CONNECTION>
            <TYPE>CSV</TYPE>
            <VALUE>CSV</VALUE>
            <CSTRING></CSTRING>
        </CONNECTION>
    </SOURCE>
</ROOT>

the wanted, correct result is produced:

<ROOT>
   <DATA>
      <ALLOC>
         <VALUE>4712</VALUE>
      </ALLOC>
   </DATA>
   <SOURCE>
      <CONNECTION>
         <CSTRING>jdbc string</CSTRING>
      </CONNECTION>
   </SOURCE>
</ROOT>

Explanation:

For every element in the XML document, its XPath expression (in the style specified in the question) is produced. This element is:

  • copied completely, if its XPath expression is equal to one of the passed as parameters XPath expressions.
  • shallow-copied, if its XPath expression is the string prefix of one or more of the passed as parameters XPath expressions
  • ignored (deleted) otherwise

Genericity of the solution:

The input XPath expressions can be passed as an <xsl:param> on the invocation of the transformation, or can be in an XML file, whose URI is passed as parameter to the transformation.

Note:

I spent the last hours with reading threads in a lot of fourms, tries with a lot of amount of different xslt's and reading of more threads.

For a more involved and elegant way of producing an XPath expression for every type of node, see this answer.

Community
  • 1
  • 1
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
0

As I understand it, you want an XSLT that will take a sequence of XPath expressions, then reduce the input XML to only those elements that match the XPath expressions and their ancestors.

You don't give any indication of which XSLT version you want to use, or which processor you'll be using, so it's difficult to give you good example code. Instead I'll outline a few options I think you can choose from:

  1. Generate some XSLT (using XSLT?) like that in @michael.hor257k 's answer, using the XPath statements as an input, and run that XSLT on your input. This will probably scale well, but requires a decent amount of initial investment, and be more complex to write than the other options.
  2. Use the xsl:key and key() function to define elements you want to keep. Remember you want to keep all ancestors.
  3. Use functions, parameters, or call templates to evaluate whether the element you are examining has an XPath address that corresponds to any of your XPath lists or their ancestors. You can probably use parameters to save a bunch of processing time.
  4. Something involving saxon:parse() or some other custom function that may or may not be available in your environment.

TMTOWTDI. Whichever method you choose, you'll probably want to use XSLT 2 so that you can treat your list of XPath addresses as a sequence of strings; you'll probably also want to expand out that sequence to include all ancestors - "/ROOT[1]/DATA[1]/ALLOC[2]" becomes ("/ROOT[1]/DATA[1]/ALLOC[2]", "/ROOT[1]/DATA[1]", "/ROOT[1]") - to simplify things.

Hell, I got bored and did you an XSLT 2 implementation:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:local="http://example.com/local"
  exclude-result-prefixes="xs local"
  version="2.0">

  <xsl:output indent="yes"/>

  <xsl:param name="XPath" select="('/ROOT[1]/DATA[1]/ALLOC[2]/VALUE[1]', '/ROOT[1]/SOURCE[1]/CONNECTION[1]/CSTRING[1]')" as="xs:string+"/>

  <xsl:variable name="XPe" as="xs:string+">
    <xsl:for-each select="$XPath">
      <xsl:sequence select="local:ancestorize(.)"/>
    </xsl:for-each>
  </xsl:variable>

  <xsl:variable name="XPd" as="xs:string+">
    <xsl:sequence select="distinct-values($XPe)"/>
  </xsl:variable>

  <xsl:template match="@*|*">
    <xsl:param name="parentXP" as="xs:string?"/>
    <xsl:variable name="selfXP" as="xs:string">
      <xsl:variable name="seq">
        <xsl:value-of select="$parentXP"/>
        <xsl:text>/</xsl:text>
        <xsl:if test=". is ../@*">
          <!-- this test is a bit untested: you may need a better test to tell if you're looking at an attribute; I leave it as an exercise for you! -->
          <xsl:text>@</xsl:text>
        </xsl:if>
        <!-- I'm assuming no namespaces: if you have namespaces you'll have to build in your prefix here -->
        <xsl:value-of select="local-name()"/>
        <xsl:text>[</xsl:text>
        <xsl:value-of select="1 + count(preceding-sibling::*[name() eq current()/name()])"/>
        <xsl:text>]</xsl:text>
      </xsl:variable>
      <xsl:value-of select="xs:string($seq)"/>
    </xsl:variable>
    <xsl:if test="$selfXP = $XPd">
      <xsl:copy>
        <xsl:apply-templates select="@* | node()">
          <xsl:with-param name="parentXP" select="$selfXP"/>
        </xsl:apply-templates>
      </xsl:copy>
    </xsl:if>
  </xsl:template>

  <xsl:template match="text()">
    <xsl:param name="parentXP"/>
    <xsl:if test="$parentXP = $XPd and normalize-space(.) ne ''">
      <xsl:copy/>
    </xsl:if>
  </xsl:template>

  <xsl:function name="local:ancestorize" as="xs:string+">
    <xsl:param name="XPath" as="xs:string"/>
    <xsl:sequence select="$XPath"/>
    <xsl:if test="count(tokenize($XPath, '/')) gt 1">
      <xsl:sequence select="local:ancestorize(string-join((tokenize($XPath, '/'))[not(position() eq last())], '/'))"/>
    </xsl:if>
  </xsl:function>

</xsl:stylesheet>
Tom Hillman
  • 327
  • 1
  • 10