4

I have several single XML-files containing historic letters in TEI. Now I want to merge them into one single file with the date as the criteria.

A1.xml

<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:id="1">
<teiHeader>
    <title>Letter 1</title>
    <date when="19990202" n="0"></date>
</teiHeader>
<text>
        <p>Content of letter 1</p>
</text>
</TEI>

and a second file, A2.xml:

<?xml version="1.0" encoding="UTF-8"?>
    <TEI xml:id="2">
    <teiHeader>
        <title>Letter 1</title>
        <date when="20010202" n="0"></date>
    </teiHeader>
    <text>
            <p>Content of letter 2</p>
    </text>
    </TEI>

and a third one, A3.xml:

<?xml version="1.0" encoding="UTF-8"?>
    <TEI xml:id="3">
    <teiHeader>
        <title>Letter 3</title>
        <date when="18880101" n="0"></date>
    </teiHeader>
    <text>
            <p>Content of letter 3</p>
    </text>
    </TEI>

The files are named in consecutive file names "A001.xml" to "A999.xml", but not in the desired order. So my prefered output would be a single file letters.xml:

<?xml version="1.0" encoding="UTF-8"?>
<CORRESPONDENCE>

<TEI xml:id="3">
        <teiHeader>
            <title>Letter 3</title>
            <date when="18880101" n="0"></date>
        </teiHeader>
        <text>
                <p>Content of letter 3</p>
        </text>
        </TEI>

    <TEI xml:id="1">
    <teiHeader>
        <title>Letter 1</title>
        <date when="19990202" n="0"></date>
    </teiHeader>
    <text>
            <p>Content of letter 1</p>
    </text>
    </TEI>
        <TEI xml:id="2">
        <teiHeader>
            <title>Letter 1</title>
            <date when="20010202" n="0"></date>
        </teiHeader>
        <text>
                <p>Content of letter 2</p>
        </text>
        </TEI>
</CORRESPONDENCE>

Even though I find ways of merging several XML files into one, I don't manage to get it to work using the sorting criteria. Is this even possible?

Ian Roberts
  • 120,891
  • 16
  • 170
  • 183
martinanton
  • 217
  • 1
  • 7

2 Answers2

5

Is this even possible?

XSLT is designed to be able to do any transformation task with XML and is considered Turing complete so yes, possible indeed.

I'm going to assume XSLT 3.0, because this is an excellent example of demonstrating a new feature of that version: xsl:merge. Not that it wasn't possible, but it just wasn't that simple. It is specifically designed to work with external sources, but can work with any input, or even any size (it is streamable).

XSLT 3.0 xsl:merge example

Using your example above, the following code will take all XML files by that file pattern, and creates a single file with a copy of each document, sorted by date.

<!-- xsl:initial-template, new in XSLT 3.0 is like "int main()" in C-style languages -->
<xsl:template name="xsl:initial-template">
    <!-- your other code here -->
    <result>
        <xsl:merge>

            <!-- 
            xsl:merge defines the source for merging. It is quite powerful. Here
            is a simple example with your data.

            With for-each-item you select a sequence of items that need to be merged,
            which goes in two steps, first you select a list of anchor items, then
            you use the select-attribute to select the sequence you want to merge. Here 
            a collection of documents is requested, like in OP's question

            The select statement selects, with focus on each document, the sequence
            of items to be merged. This sequence can be of any length (here it selects all
            historic letters)

            The merge-key defines the key for which items in the merge sequence are sorted,
            an incorrect order will result in an error, unless sort-before-merge 
            is also specified.
            -->
            <xsl:merge-source 
                for-each-item="collection('files/A*.xml')"
                select="/root/historic-letter/tei:TEI"
                sort-before-merge="true">
                <xsl:merge-key 
                    select="tei:teiHeader/tei:data/tei:when"
                    order="descending" 
                    data-type="number" />
            </xsl:merge-source>

            <!-- the merge action is called for each item resulting from the select 
            statement above. Only in this place can you use current-merge-key()
            and the current-merge-group() functions, which work similar to their grouping
            counterparts.
            -->
            <xsl:merge-action>
                <source original-document="{base-uri()}">
                    <xsl:copy-of select="." />
                </source>
            </xsl:merge-action>
        </xsl:merge>
    </result>
</xsl:template>
Abel
  • 56,041
  • 24
  • 146
  • 247
  • Thank you! It looks great! Nevertheless, I'm quite new and I can't get it to work. I had to remove the xsl-namespaces as OxygenXML complained about xsl being reserved. Afterwards i could transform, but it just transforms the first file I start the transformation with, not all in the folder. What am I doing wrong? – martinanton Sep 18 '15 at 11:09
  • @martinanton, if you are using oXygen, then underneath you are using Saxon. Make sure to specify `version="3.0"` in the XSLT file. Yes, oXygen complains about `xsl:initial-template` (they know about it, they will fix it), but you can use any other backwards compatible name. Also, [check if you have the collection syntax correct](http://www.saxonica.com/documentation9.5/sourcedocs/collections.html) (test with the xpath `copy-of(collection"your-file-spec"))`. – Abel Sep 18 '15 at 12:18
  • @martinanton Note also that you need a paid licence for Saxon - the "PE" edition for basic level XSLT 3.0 support, and "EE" if you want streaming. The free "HE" version only supports XSLT 2.0. – Ian Roberts Sep 18 '15 at 12:44
  • Thank you! OxygenXML features all versions of Saxon, so this is not the problem! You're right i might have to dig into the collection-syntax to get it working. I hopefully will do so one day, as i can see the elegance of your solution! – martinanton Sep 18 '15 at 18:44
  • @mart, read the collection syntax doc at Saxonica, it's quite simple, really :) – Abel Sep 18 '15 at 20:52
  • 1
    Try something like `collection('files?select=A*.xml')` – Michael Kay Sep 19 '15 at 14:13
  • @mart, see Michael's comment on syntax. – Abel Sep 19 '15 at 20:46
  • @MichaelKay Thank you! I tried that, but it still copies only the initial letter, not all three, I'm afraid. – martinanton Sep 21 '15 at 08:33
  • 1
    Superb explanation with XSLT3, plus one. – Rudramuni TP Jul 13 '17 at 10:58
1

As you simply want to concatenate the XML documents with Saxon 9 and XSLT 2.0 it is as easy as

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xs"
  version="2.0">

<xsl:param name="file-suffix" as="xs:string" select="'A*.xml'"/>

<xsl:template match="/" name="main">
  <CORRESPONDENCE>
    <xsl:perform-sort select="collection(concat('.?select=', $file-suffix))/*">
      <xsl:sort select="teiHeader/date/xs:integer(@when)"/>
    </xsl:perform-sort>
  </CORRESPONDENCE>
</xsl:template>

</xsl:stylesheet>

You would run that with command line options -it:main -xsl:stylesheet.xsl or if needed with a primary input document, but the documents to be processed would simply be fetched in using the collection as shown.

If the elements in your input samples are in the namespace http://www.tei-c.org/ns/1.0, as Abel commented, then you would need to change the code to

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xpath-default-namespace="http://www.tei-c.org/ns/1.0"
  exclude-result-prefixes="xs"
  version="2.0">

<xsl:param name="file-suffix" as="xs:string" select="'A*.xml'"/>

<xsl:template match="/" name="main">
  <CORRESPONDENCE>
    <xsl:perform-sort select="collection(concat('.?select=', $file-suffix))/*">
      <xsl:sort select="teiHeader/date/xs:integer(@when)"/>
    </xsl:perform-sort>
  </CORRESPONDENCE>
</xsl:template>

</xsl:stylesheet>
Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • I agree, if no additional tweaking is needed, the individual files have the same structure for the key and documents, then this is just as simple. `teiHeader` should probably be in the TEI namespace though, his original question had ``. – Abel Sep 18 '15 at 12:21