XProc: multiple XSLT transformation with intermediate files

Question

I need to do several XSLT transformations with intermediate XML files. (I need the files, the real case is a bit more tricky as a later step loads intermediate files)

first.xml ------------>   intermediate.xml ------------> final.xml
          first.xsl                         final.xsl

I'd like to create an XProc pipleline. I have tried to write the following code, but this gives me an error:

SCHWERWIEGEND: runxslt.xpl:26:44:err:XD0011:Could not read: intermediate.xml 
17.05.2012 15:15:35 com.xmlcalabash.drivers.Main error
SCHWERWIEGEND: It is a dynamic error if the resource referenced by a p:document element does not exist, cannot be accessed, or is not a well-formed XML document.
17.05.2012 15:15:35 com.xmlcalabash.drivers.Main error
SCHWERWIEGEND: Underlying exception: net.sf.saxon.s9api.SaxonApiException: I/O error reported by XML parser processing file:/<somepath>/intermediate.xml:
/<somepath>/intermediate.xml (No such file or directory)

(where SCHWERWIEGEND means something like FATAL) So obviously the file intermediate.xml has not been written.

This is the xpl-document that I have used:

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
  xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0">

  <p:input port="source">
    <p:document href="first.xml"/>
  </p:input>

  <p:output port="result" sequence="true">
    <p:empty/>
  </p:output>

  <p:xslt name="first-to-intermediate">
    <p:input port="stylesheet">
      <p:document href="first.xsl"/>
    </p:input>
    <p:input port="parameters">
      <p:empty/>
    </p:input>
  </p:xslt>

  <p:store href="intermediate.xml" />

  <p:xslt>
    <p:input port="source">
      <p:document href="intermediate.xml"/>
    </p:input>
    <p:input port="stylesheet">
      <p:document href="final.xsl"/>
    </p:input>
    <p:input port="parameters">
      <p:empty/>
    </p:input>
  </p:xslt>

  <p:store href="final.xml"/>

</p:declare-step>

Just for the sake of completeness: these are the transformation files:

source.xml:

<root>
  <element name="A" />
</root>

first.xsl:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="2.0">
  <xsl:output indent="yes"/>

  <xsl:template match="root">
    <root>
      <xsl:apply-templates/>
    </root>
  </xsl:template>
  <xsl:template match="element">
    <intermediate name="A" />
  </xsl:template>

</xsl:stylesheet>

final.xsl:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="2.0">
  <xsl:output indent="yes"/>

  <xsl:template match="root">
    <root>
      <xsl:apply-templates/>
    </root>
  </xsl:template>
  <xsl:template match="intermediate">
    <final name="A" />
  </xsl:template>

</xsl:stylesheet>

Here is a note on the real application (the above is a simplification, of course).

First step: convert the source into something more suitable for my processing. Output: companies.xml
Take the output from step 1 and create an index file (index.xml) from that. The index file must be editable manually.
The third step is to merge the files created by step 1 and 2 and create a final xml (final.xml)

The index file must be written to disk and I must be able to run the last step by itself (that's a different problem - I'd write a different pipeline for that)

The output of companies.xml (step 1) is optional, it could be saved in memory (but it might get large).

grtjn · Accepted Answer · 2012-05-17T19:00:35.407

I'm not really sure why XMLCalabash doesn't work here. I thought the logic should in principle work, but apparently XMLCalabash is holding off on writing the file to disk till later, perhaps even till the end. Not sure why.

But there is an elegant solution, because you don't need to store intermediate results before continuing processing. In fact, it is best to not use hard-coded loads and stores at all. Instead, use something like the following:

<?xml version="1.0" encoding="UTF-8"?> 
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" 
  xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0"> 

  <p:input port="source" sequence="true"/> 
  <p:input port="parameters" kind="parameter"/>
  <p:output port="result" sequence="true"/> 

  <p:xslt name="first-to-intermediate"> 
    <p:input port="stylesheet"> 
      <p:document href="first.xsl"/> 
    </p:input> 
  </p:xslt> 

  <p:xslt> 
    <p:input port="stylesheet"> 
      <p:document href="final.xsl"/> 
    </p:input> 
  </p:xslt> 

</p:declare-step>

It requires a slightly different call to XMLCalabash. Call it like this:

java -jar Calabash.jar -i source=first.xml -o result=final.xml runxslt.xpl

With -i you tie an input source to an input file, but from outside the script so no hard-coding required. Similarly with -o you redirect output to a target file.

I also added a 'parameters' input to your code, which get automatically connected to those of p:xslt. That way you don't need to specify those with a p:empty. It also allows passing parameter values from the command-line into those xslt's.

And because I removed the p:store, the 'source' input of the second p:xslt is not necessary either. The results of the first p:xslt goes directly into the (primary) source input of the following step by default.

-- edit --

To elaborate on my own comments that you can do a p:store and reuse the output of the first p:xslt twice without loading the intermediate doc from disk. You can do it like this:

<?xml version="1.0" encoding="UTF-8"?> 
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" 
  xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0"> 

  <p:input port="source" sequence="false"/> 
  <p:input port="parameters" kind="parameter"/>
  <p:output port="result" sequence="false"/> 

  <p:xslt name="first-to-intermediate"> 
    <p:input port="stylesheet"> 
      <p:document href="first.xsl"/> 
    </p:input> 
  </p:xslt> 

  <p:store href="intermediate.xml"/>

  <p:xslt> 
    <p:input port="source"> 
      <p:pipe step="first-to-intermediate" port="result"/> 
    </p:input> 
    <p:input port="stylesheet"> 
      <p:document href="final.xsl"/> 
    </p:input> 
  </p:xslt> 

</p:declare-step>

Note that I changed sequence=true to false on both input and output of the declare-step. Storing sequences of intermediate results requires extra care. This should prevent mistakes.

HTH!

Well, in my case I _need_ the intermediate XML files. I was unclear in my question, I apologize. I'll edit it. — topskip, May 17 '12 at 18:14
@patrick Can you elaborate where you are using them for? You can use p:log to store intermediate results for debugging purposes, or you could do the p:store between the p:xslt, but instead of reading the stored file, use p:port in the input of the second p:xslt to refer to the output of the first p:xslt. You can tie as many inputs of steps to the result of a single other step as you like. — grtjn, May 17 '12 at 18:19
Done at the end of the document. Was I able to express myself clearly? If not, I try to find better words. — topskip, May 17 '12 at 18:26
OK, now I understand. The trick is to connect the input of the second XSLT transformation _not_ on the file but on the output of step one, so the order of these are important and the XProc processor does not optimize anything. Thank you very much! — topskip, May 20 '12 at 12:07
@patrick: In addition to what is said on the xproc mailing list: the trick is that steps are connected automatically as long as they have primary inputs and outputs. p:store doesn't have a primary output, so that is why you have to specify an input for the second p:xslt explicitly. But instead of loading a file (which requires stable execution order), you can easily connect directly to other output ports. You'll notice you can even point to outputs of steps written *after* the that step. As long as its not circular, it should work.. — grtjn, May 20 '12 at 19:38

XProc: multiple XSLT transformation with intermediate files

1 Answers1