2

I have a bunch of XML files with a fixed, country-based naming schema: report_en.xml, report_de.xml, report_fr.xml, etc. Now I want to write an XSLT style sheet that reads each of these files via the document() XPath function, extracts some values and generates one XML files with a summary. My question is: How can I iterate over the source files without knowing the exact names of the files I will process?

At the moment I'm planning to generate an auxiliary XML file that holds all the file names and use the auxiliary XML file in my stylesheet to iterate. The the file list will be generated with a small PHP or bash script. Are there better alternatives?

I am aware of XProc, but investing much time into it is not an option for me at the moment. Maybe someone can post an XProc solution. Preferably the solution includes workflow steps where the reports are downloaded as HTML and tidied up :)

I will be using Saxon as my XSLT processor, so if there are Saxon-specific extensions I can use, these would also be OK.

chiborg
  • 26,978
  • 14
  • 97
  • 115

2 Answers2

4

You can use the standard XPath 2.x collection() function, as implemented in Saxon 9.x

The Saxon implementation allows a search pattern to be used in the string-Uri argument of the function, thus you may be able to specify after the path of the directory a pattern for any filename starting with report_ then having two other characters, then ending with .xml.

Example:

This XPath expression:

collection('file:///c:/?select=report_*.xml')

selects the document nodes of every XML document that resides in c:\ in a file with name starting with report_ then having a 0 or more characters, then ending with .xml.

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
2

The answer by Dimitre looks like the quickest solution in your case. But since you asked, here an XProc alternative:

<p:declare-step version="1.0" xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step" exclude-inline-prefixes="#all" name="main">

<!-- create context for p:variable with base-uri pointing to the location of this file -->
<p:input port="source"><p:inline><x/></p:inline></p:input>

<!-- any params passed in from outside get passed through to p:xslt automatically! -->
<p:input port="parameters" kind="parameter"/>

<!-- configuration options for steering input and output -->
<p:option name="input-dir" select="'./'"/>
<p:option name="input-filter" select="'^report_.*\.xml$'"/>
<p:option name="output-dir" select="'./'"/>

<!-- resolve any path to base uri of this file, to make sure they are absolute -->
<p:variable name="abs-input-dir" select="resolve-uri($input-dir, base-uri(/))"/>
<p:variable name="abs-output-dir" select="resolve-uri($output-dir, base-uri(/))"/>

<!-- first step: get list of all files in input-dir -->
<p:directory-list>
    <p:with-option name="path" select="$abs-input-dir"/>
</p:directory-list>

<!-- iterate over each file to load it -->
<p:for-each>
    <p:iteration-source select="//c:file[matches(@name, $input-filter)]"/>
    <p:load>
        <p:with-option name="href" select="resolve-uri(/c:file/@name, $abs-input-dir)"/>
    </p:load>
</p:for-each>

<!-- wrap all files in a reports element to be able to hand it in to the xslt as a single input document -->
<p:wrap-sequence wrapper="reports"/>

<!-- apply the xslt (stylesheet is loaded below) -->
<p:xslt>
    <p:input port="stylesheet">
        <p:pipe step="style" port="result"/>
    </p:input>
</p:xslt>

<!-- store the result in the output dir -->
<p:store>
    <p:with-option name="href" select="resolve-uri('merged-reports.xml', $abs-output-dir)"/>
</p:store>

<!-- loading of the stylesheet.. -->
<p:load href="process-reports.xsl" name="style"/>

</p:declare-step>

Store the above as process-reports.xpl for instance. You can run it with XMLCalabash (http://xmlcalabash.com/download/). You can run it like this:

java -jar calabash.jar process-reports.xpl input-dir=./ output-dir=./

The above code assumes a process-reports.xsl that takes one documents that wraps all reports, and does a bit of processing on it. You could do processing in pure XProc as well, but you might prefer it this way.

You could also move the p:xslt step up to within the p:for-each (below the p:load), that would cause the xslt to be applied to each report individually.

Good luck!

grtjn
  • 20,254
  • 1
  • 24
  • 35