3

What would be the best way to populate (or generate) an XML template-file from a mapping of XPath expressions?

The requirements are that we will need to start with a template (since this might contain information not otherwise captured in the XPath expressions).

For example, a starting template might be:

<s11:Envelope xmlns:s11='http://schemas.xmlsoap.org/soap/envelope/'>
    <ns1:create xmlns:ns1='http://predic8.com/wsdl/material/ArticleService/1/'>
      <article xmlns:ns1='http://predic8.com/material/1/'>
        <name>?XXX?</name>
        <description>?XXX?</description>
        <price xmlns:ns1='http://predic8.com/common/1/'>
          <amount>?999.99?</amount>
          <currency xmlns:ns1='http://predic8.com/common/1/'>???</currency>
        </price>
        <id xmlns:ns1='http://predic8.com/material/1/'>???</id>
      </article>
    </ns1:create>
  </s11:Body>
</s11:Envelope>

Then we are supplied, something like:

expression: /create/article[1]/id                => 1
expression: /create/article[1]/description       => bar
expression: /create/article[1]/name[1]           => foo
expression: /create/article[1]/price[1]/amount   => 00.00
expression: /create/article[1]/price[1]/currency => USD
expression: /create/article[2]/id                => 2
expression: /create/article[2]/description       => some name
expression: /create/article[2]/name[1]           => some description
expression: /create/article[2]/price[1]/amount   => 00.01
expression: /create/article[2]/price[1]/currency => USD

We should then generate:

<ns1:create xmlns:ns1='http://predic8.com/wsdl/material/ArticleService/1/'>
    <article xmlns:ns1='http://predic8.com/material/1/'>
        <name xmlns:ns1='http://predic8.com/material/1/'>foo</name>
        <description>bar</description>
        <price xmlns:ns1='http://predic8.com/common/1/'>
            <amount>00.00</amount>
            <currency xmlns:ns1='http://predic8.com/common/1/'>USD</currency>
        </price>
        <id xmlns:ns1='http://predic8.com/material/1/'>1</id>
    </article>
    <article xmlns:ns1='http://predic8.com/material/2/'>
        <name>some name</name>
        <description>some description</description>
        <price xmlns:ns1='http://predic8.com/common/2/'>
            <amount>00.01</amount>
            <currency xmlns:ns1='http://predic8.com/common/2/'>USD</currency>
        </price>
        <id xmlns:ns1='http://predic8.com/material/2/'>2</id>
    </article>
</ns1:create>

I am implemented in Java, although I would prefer an XSLT-based solution if one is possible.

PS: This question is the reverse of another question I recently asked.

Community
  • 1
  • 1
Larry
  • 11,439
  • 15
  • 61
  • 84
  • Do you want a solution in Java? or in XSLT? Your question suggests Java, but you have tagged the question as XSLT. Also note, that in your output document, you have a large number of pointless name-space declarations. – Sean B. Durkin Jul 08 '12 at 18:06
  • Actually, the XML shown above is one that is generated as part of my some libraries I’m using. It could be pointless, but it just shows that we need to consider namespaces and the like that also needs to be part of the final output. As for the solution: I would really prefer an XSLT solution, however, is such is not possible, I would then opt for a Java solution. – Larry Jul 08 '12 at 18:15
  • I see an inconsistency: Why some element names such as `id` and `description` aren't followed by `[1]` while the rest of the leaf nodes are followed by `[1]` ? – Dimitre Novatchev Jul 08 '12 at 19:18
  • It is easier to manually create the wanted XML document than to create the set of population expressions -- I strongly recommend not implementing such processing at all. It is necessary to have a sound design so that any such "curiosities" are avoided. – Dimitre Novatchev Jul 08 '12 at 19:22
  • @DimitreNovatchev It is only because, those elements can have potential multiple occurrences. If this is an issue, of course, we could make it all consistent, like the output shown in the linked question. (i.e. where all elements have [1], and those with multiple occurrences are then iterated, etc.) – Larry Jul 08 '12 at 19:25
  • Larry, As I already commented, such processing is totally unnecessary -- whoever creates the expression will need *less* time in creating the complete XML document. – Dimitre Novatchev Jul 08 '12 at 19:32
  • @DimitreNovatchev I would agreed, the point is I’m actually working with an API that produces the expressions. So I don’t really have any choice, I now need to formulate an appropriate XML message that corresponds with the expressions. Of course, if I were the one producing the expressions, then that would be a different story. But I kind of need to work with what is given... so if you could think about an efficient solution, it would be greatly appreciated! – Larry Jul 08 '12 at 19:48
  • Larry, would it be feasible to accept a 2-invocation solution? That means that you run one style-sheet with the XPATH information as input. Its output is a style-sheet, which you then run as the second invocation? If you can tolerate a two-step process, this may be the simplest solution. – Sean B. Durkin Jul 08 '12 at 23:51

2 Answers2

3

This transformation creates from the "expressions" an XML document that has the structure of the wanted result -- it remains to transform this result into the final result:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 xmlns:my="my:my">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:variable name="vPop" as="element()*">
    <item path="/create/article[1]/id">1</item>
    <item path="/create/article[1]/description">bar</item>
    <item path="/create/article[1]/name[1]">foo</item>
    <item path="/create/article[1]/price[1]/amount">00.00</item>
    <item path="/create/article[1]/price[1]/currency">USD</item>
    <item path="/create/article[1]/price[2]/amount">11.11</item>
    <item path="/create/article[1]/price[2]/currency">AUD</item>
    <item path="/create/article[2]/id">2</item>
    <item path="/create/article[2]/description">some name</item>
    <item path="/create/article[2]/name[1]">some description</item>
    <item path="/create/article[2]/price[1]/amount">00.01</item>
    <item path="/create/article[2]/price[1]/currency">USD</item>
 </xsl:variable>

 <xsl:template match="/">
  <xsl:sequence select="my:subTree($vPop/@path/concat(.,'/',string(..)))"/>
 </xsl:template>

 <xsl:function name="my:subTree" as="node()*">
  <xsl:param name="pPaths" as="xs:string*"/>

  <xsl:for-each-group select="$pPaths"
    group-adjacent=
        "substring-before(substring-after(concat(., '/'), '/'), '/')">
    <xsl:if test="current-grouping-key()">
     <xsl:choose>
       <xsl:when test=
          "substring-after(current-group()[1], current-grouping-key())">
         <xsl:element name=
           "{substring-before(concat(current-grouping-key(), '['), '[')}">

          <xsl:sequence select=
            "my:subTree(for $s in current-group()
                         return
                            concat('/',substring-after(substring($s, 2),'/'))
                             )
            "/>
        </xsl:element>
       </xsl:when>
       <xsl:otherwise>
        <xsl:value-of select="current-grouping-key()"/>
       </xsl:otherwise>
     </xsl:choose>
     </xsl:if>
  </xsl:for-each-group>
 </xsl:function>
</xsl:stylesheet>

When this transformation is applied on any XML document (not used), the result is:

<create>
   <article>
      <id>1</id>
      <description>bar</description>
      <name>foo</name>
      <price>
         <amount>00.00</amount>
         <currency>USD</currency>
      </price>
      <price>
         <amount>11.11</amount>
         <currency>AUD</currency>
      </price>
   </article>
   <article>
      <id>2</id>
      <description>some name</description>
      <name>some description</name>
      <price>
         <amount>00.01</amount>
         <currency>USD</currency>
      </price>
   </article>
</create>

Note:

  1. You need to transform the "expressions" you are given into the format used in this transformation -- this is easy and straightforward.

  2. In the final transformation you need to copy every node "as-is" (using the identity rule), with the exception that the top node should be generated in the "http://predic8.com/wsdl/material/ArticleService/1/" namespace. Note that the other namespaces present in the "template" are not used and can be safely ommitted.

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • Thanks, but is this a generic solution? – Larry Jul 09 '12 at 05:09
  • @Larry: Of course, this is a generic solution. It works with any set of "expressions" -- do note that in my example I added two new expressions, containing `price[2]`. If you need the created elements to be in specific namespace, these need to be specified (not any template at all) in a separate XML document using a convenient notation. – Dimitre Novatchev Jul 09 '12 at 11:46
  • Ok, thanks. Btw: how about attribute values, would these work as well? For example, an expression such as: `/create/article[@val]=> 123`? – Larry Jul 09 '12 at 12:03
  • @Larry, almost everything can be done, however this question has become two big and is "overflowing". I recommend that you rethink your current requirements and ask a new question. I believe this answer is a correct solution to the current question and you may consider accepting it. – Dimitre Novatchev Jul 09 '12 at 12:09
  • Thanks again, I’ve accepted. In regard to requirements, I really need what I’ve mentioned in the Q. I.e. that I want to populate an XML file, given a mapping of XPath expressions. The added requirement is that, I also want to begin with a template file, in the case that there is additional information not directly captured in the XPath expressions. For e.g., namespaces and other attributes. So perhaps, I was hoping a solution would be one where the template could be taken as “input” and used for this purpose. Furthermore, as the question also specifies Xpath, attributes should work as well. – Larry Jul 09 '12 at 12:29
  • @Larry, You have accumulated enough new requirements for a new question. Please, ask it. I want only to warn that the "template file" is useless and it may contradict the set of expressions. I would recommend adding "namespace expressions", similar in spirit to the ones used by Sean. – Dimitre Novatchev Jul 09 '12 at 12:41
  • Ok thanks, I took your suggestion and asked a more detailed question with broader requirements. You can have a look at here: http://stackoverflow.com/questions/11395990/. If you have a solution, it would be much appreciated. – Larry Jul 09 '12 at 13:27
0

This solution requires you to re-organise your XPATH input information slightly, and to allow a 2-step transformation. The first transformation will write the stylesheet, which will be executed in the second transformation - Thus the client is required to do two invocations of the XSLT engine. Let us know if this is a problem.

Step One

Please re-organise your XPATH information into an XML document like so. It should not be difficult to do, and even an XSLT script could be written to do the job.

<paths>
 <rule>
  <match>article[1]/id[1]</match>
  <namespaces>
   <namespace prefix="ns1">http://predic8.com/wsdl/material/ArticleService/1/</namespace>
   <!-- The namespace node declares a namespace that is used in the match expression.
        There can be many of these. It is not required to define the s11: namespace,
        nor the ns1 namespace. -->
  </namespaces>
  <replacement>1</replacement>
 </rule> 
 <rule>
  <match>article[1]/description[1]</match>
  <namespaces/>
  <replacement>bar</replacement>
 </rule>
 ... etc ...
</paths>

Solution constraints

In the above rules document we are constrained so that:

  1. The match is implicitly prefixed 'expression: /create/'. Don't put that explicitly.
  2. All matches must begin like article[n] where n is some ordinal number.
  3. We can't have zero rules.
  4. Any prefixes that you use in the match, other than s11="http://schemas.xmlsoap.org/soap/envelope/" and ns1="http://predic8.com/wsdl/material/ArticleService/1/". (Note: I don't think it is valid for namespaces to end in '/' - but not sure about that), are defined in the namespaces node.

The above is the input document to the step one transformation. Apply this document to this style-sheet ...

<xsl:stylesheet version="2.0"
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
      xmlns:step2="http://www.w3.org/1999/XSL/Transform-step2"
      xmlns:s11="http://schemas.xmlsoap.org/soap/envelope/"                       
      xmlns:ns1="http://predic8.com/wsdl/material/ArticleService/1/"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes='xsl'>
<xsl:output method="xml" indent="yes" encoding="UTF-8" />
<xsl:namespace-alias stylesheet-prefix="step2" result-prefix="xsl"/>

<xsl:template match="/">
 <step2:stylesheet version="2.0">
 <step2:output method="xml" indent="yes" encoding="UTF-8" />

  <step2:variable name="replicated-template" as="element()*">
   <step2:apply-templates select="/" mode="replication" />
  </step2:variable>

  <step2:template match="@*|node()" mode="replication">
     <step2:copy>
        <step2:apply-templates select="@*|node()" mode="replication" />
     </step2:copy>
  </step2:template>

  <step2:template match="/s11:Envelope/s11:Body/ns1:create/article" mode="replication">
   <step2:variable name="replicant" select="." />  
    <step2:for-each select="for $i in 1 to
       {max(for $m in /paths/rule/match return
        xs:integer(substring-before(substring-after($m,'article['),']')))}
          return $i">
   <step2:for-each select="$replicant">
       <step2:copy>
        <step2:apply-templates select="@*|node()" mode="replication" />
       </step2:copy>
      </step2:for-each>   
     </step2:for-each>    
  </step2:template>

  <step2:template match="@*|node()">
   <step2:copy>
    <step2:apply-templates select="@*|node()"/>
   </step2:copy>
  </step2:template> 

  <step2:template match="/">
   <step2:apply-templates select="$replicated-template" />
  </step2:template>

  <xsl:apply-templates select="paths/rule" /> 
 </step2:stylesheet>
</xsl:template>

<xsl:template match="rule">
 <step2:template match="s11:Envelope/s11:Body/ns1:create/{match}">
  <xsl:for-each select="namespaces/namespace">
   <xsl:namespace name="{@prefix}" select="." />
  </xsl:for-each>
  <step2:copy>
   <step2:apply-templates select="@*"/>
   <step2:value-of select="'{replacement}'"/>
   <step2:apply-templates select="*"/>
  </step2:copy>
 </step2:template>
</xsl:template>

</xsl:stylesheet>

Step Two

Apply your soap envelope file, as an input document, to the style-sheet which was output from step one. The result is the original soap document, altered as required. This is a sample of a step two style-sheet, with just the first rule (/create/article[1]/id => 1) being considered for the sake of simplicity of illustration.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:s11="http://schemas.xmlsoap.org/soap/envelope/"
                version="2.0">
   <xsl:output method="xml" indent="yes" encoding="UTF-8"/>
   <xsl:template match="@*|node()">
      <xsl:copy>
         <xsl:apply-templates select="@*|node()"/>
      </xsl:copy>
   </xsl:template>
   <xsl:template xmlns:ns1="http://predic8.com/wsdl/material/ArticleService/1/"
                 match="/s11:Envelope/s11:Body/ns1:create[1]/article[1]/id[1]">
      <xsl:copy>
         <xsl:apply-templates select="@*"/>
         <xsl:value-of select="'1'"/>
         <xsl:apply-templates select="*"/>
      </xsl:copy>
   </xsl:template>
</xsl:stylesheet>

More solution constraints

  1. The template document must contain at least one /s11:Envelope/s11:Body/ns1:create/article . Only the article node is replicated (deeply) as required by rules. Other than than it can be any structure.
  2. The template document cannot contain nested levels of the s11:Envelope/s11:Body/ns1:create node.

Explanation

You will notice that your XPATH expressions are not far removed from a match condition of template. Therefore it is not too difficult to write a stylesheet which re-expresses your XPATH and replacement values as template rules. When writing a style-sheet writing style-sheet the xsl:namespace-alias enables us to disambiguate "xsl:" as an instruction and "xsl:" as intended output. When XSLT 3.0 comes along, we are quiet likely to be able to reduce this algorithm into one step, as it will allow dynamic XPATH evaluation, which is really the nub of your problem. But for the moment we must be content with a 2-step process.

The second style-sheet is a two-phase transformation. The first stage replicates the template from the article level, as many times as needed by the rules. The second phase parses this replicated template, and applies the dynamic rules substituting text values as indicated by the XPATHs.


UPDATE

My original post was wrong. Thanks to Dimitre for pointing out the error. Please find updated solution above.

After-thought

If a two-step solultion is too complicated, and you are running on a wintel platform, you may consider purchasing the commercial version of Saxon. I believe that the commercial version has a dynamic XPATH evaluation function. I can't give you such a solution because I don't have the commercial version. I imagine a solution using an evaluate() function would be a lot simpler. XSLT is just a hobby for me. But if you are using XSLT for business purposes, the price is quiet reasonable.

Sean B. Durkin
  • 12,659
  • 1
  • 36
  • 65
  • Sean, the approach is good, with the exception of one remaining problem: if I understand well your code, it will search for `...article[2]` and will not find such a node as there isn't (and can't be) more than one `article` element in the "template" provided by the OP. Due to this reason, the result must be generated -- not just simply doing replacemants in the "template". This means, that if you construct all the rules, the generated stylesheet in the first pass, when applied on the "template" in the second pass doesn't produce more than one `article` element (or any other element `x[2]`. – Dimitre Novatchev Jul 09 '12 at 02:44
  • 1
    I see I misread the question. I think the 2-step method might still be workable for this question, but it becomes a lot harder. I'll have a think about it. – Sean B. Durkin Jul 09 '12 at 03:15
  • 1
    @DimitreNovatchev: Thank-you for pointing that out. Solution corrected . – Sean B. Durkin Jul 09 '12 at 05:43
  • Sean, I see. However, this code is not generic and uses hardcoded element names as `article` and `id`. I believe that the OP is looking for a generic solution. – Dimitre Novatchev Jul 09 '12 at 11:58