0

I have two lists of consecutive elements that relate to each other. I want to combine them, but my solution is both slow and not elegant. I am using XSLT 2.0, Saxon.

List1.xml:

<data>
<w tag="a">asda</w>
<w tag="c">sdsd</w>
<w tag="a">value2</w>
<w tag="f">fdxcc</w>
<w tag="c">no</w>
</data>

List2.xml:

<data>
<w class="2">asda</w>
<w class="5">sdsd</w>
<w class="6">value2</w>
<w class="1">fdxcc</w>
<w class="2">no</w>
</data>

Note that the values of neither @class, @tag, or content of the elements are unique; what links them is identical contents and identical sequence. (And note that the actual problem is more complicated, since I need to evaluate the elements of the first list using those of the second.)

Intended result (same order:)

<w tag="a" class="2">asda</w>
<w tag="c" class="5">sdsd</w>
<w tag="a" class="6">value2</w>
<w tag="f" class="1">fdxcc</w>
<w tag="c" class="2">no</w>

Now the obvious way to acchieve this is just to walk through one list and pick up the values from the second. I do this like this:

<xsl:template match="/">
<xsl:variable name="list1" select="doc('list1.xml')">
<xsl:variable name="list2" select="doc(*list2.xml')">

<xsl:for-each select="$list1//w">
<xsl:copy>
<xsl:copy-of select="@tag"/>
<xsl:variable name="thispos" select="position()"/>
<xsl:copy-of select="$list2//w[position()=$thispos]/@id"/>
<xsl:copy-of select="@text()"/>
</xsl:copy>
</xsl:for-each>

I have two questions: (a) is there really no better way to refer to the position in $list1 than to save it in a variable? (b) related to this question: this solution is MUCH too slow when dealing with hundreds of thousands of items. What would be a better solution?

  • **1.** Are these lists really hard-coded in your stylesheet, as would seem from the example? **2.** I am not sure why you need the (combined) result at all, when you can refer to the "other" element directly (and efficiently!) by using a **key**. -- **P.S.** Please state which version of XSLT are you using. – michael.hor257k Oct 08 '14 at 16:08
  • 1. No, they are not hardcoded, they are read into the variables from large files. This is just a simplified example. 2. I was wondering whether keys would be the way to accomplish this, but maybe I am misinterpreting you. I need the result written to file, no dynamic elements. Ultimately, this goes to a text file. I am using XSLT 2.0. using SSaxon. – Ruprecht von Waldenfels Oct 08 '14 at 19:19
  • re 1: Please adjust your example accordingly, as it makes quite a difference. re 2: **key** is the XSLT mechanism for perfroming a lookup. I am not sure what you mean by "dynamic elements". Ultimately, XSLT produces an output tree, which can be (and most often is) written to a file. – michael.hor257k Oct 08 '14 at 19:39
  • Yes, I had already edited accordingly. Re 2: As I said, I expected XSLT keys to play a role, but I didn't understand how this works - not from reading Kay's XSLT book, which is all I normally need to use. But I just thought you could also mean some dynamic links between XML files. As I said, I don't understand how to use keys to acchieve this thing. It would be nice if you could clarify. And I am very surprised this version is so inefficient - I would think that consecutive calls for position could be optimized. – Ruprecht von Waldenfels Oct 08 '14 at 19:48
  • I meant: could you show the original file (or are there two files)? Minimal, but complete examples preferred, of course. – michael.hor257k Oct 08 '14 at 20:07

2 Answers2

1

If I understand correctly, you can match either on a common value or on the position. Here's matching on value:

XSLT 2.0

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:key name="list2" match="w" use="." />

<xsl:template match="/">
    <root>
        <xsl:for-each select="data/w">
            <xsl:copy>
                <xsl:copy-of select="@*"/>
                <xsl:copy-of select="key('list2', ., document('List2.xml'))/@*"/>
                <xsl:value-of select="."/>
            </xsl:copy>
        </xsl:for-each>
    </root>
</xsl:template>

</xsl:stylesheet>

and here's matching on "position":

XSLT 2.0

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:key name="list2" match="w" use="count(preceding-sibling::w)" />

<xsl:template match="/">
    <root>
        <xsl:for-each select="data/w">
            <xsl:copy>
                <xsl:copy-of select="@*"/>
                <xsl:copy-of select="key('list2', count(preceding-sibling::w), document('List2.xml'))/@*"/>
                <xsl:value-of select="."/>
            </xsl:copy>
        </xsl:for-each>
    </root>
</xsl:template>

</xsl:stylesheet>

In both cases, the result is:

<?xml version="1.0" encoding="UTF-8"?>
<root>
   <w tag="a" class="2">asda</w>
   <w tag="c" class="5">sdsd</w>
   <w tag="a" class="6">value2</w>
   <w tag="f" class="1">fdxcc</w>
   <w tag="c" class="2">no</w>
</root>

Note:
As I mentioned earlier, if that's not your final result, there's no need to construct it. As you can see, the "other" value is always available from the context of List1 - you only need to point at it when you need it..

michael.hor257k
  • 113,275
  • 6
  • 33
  • 51
  • Yes, this works! I didn't understand keys before; now I do, and this is definetly speeding up things. THis leads to a follow-up question. I tested it and it is MUCH faster to use position() instead of count(preceding-sibling::w) in the xpath key function. Is there any way to use the position() function in the definition above? This would drastically speed up things. – Ruprecht von Waldenfels Oct 09 '14 at 10:10
  • You can **probably** use `position() - 1` when you call the key() **function**. I say *probably*, because position() is context-dependent. The same node will have different positions in different contexts. You cannot use position() in the xsl:key **element**, because a key has no context. – michael.hor257k Oct 09 '14 at 10:57
  • Yes, position()-1 works as expected in the key() function, but the fact that it cannot be used in the xsl:element still means that building the index is exponential in the number of w items; in other words, still rather slow. I*ll try to think of another solution. – Ruprecht von Waldenfels Oct 09 '14 at 12:22
  • @RuprechtvonWaldenfels IMHO, the real solution to the problem lies upstream of you: tell the person preparing the input to *index* the items. You can test this proposition by converting a couple of documents yourself first, then run your "real" code while matching on a common id value. – michael.hor257k Oct 09 '14 at 14:02
  • Sure, that would help, but that's changing the circumstances instead of solving the problem, which is more important to me at this point. The solution is to index the input files oneself, and then use keys to get these indexes. In the production example using several hundreds of thousands of entries, this works fine. I've edited the solution to reflect that approach. – Ruprecht von Waldenfels Oct 10 '14 at 09:37
  • And just a comment: the values are not guaranteed to be unique, as I said in the description of the problem, so only the second solution is actually correct. – Ruprecht von Waldenfels Oct 10 '14 at 09:42
0

The problem has been long solved, but just in case it ever helps anyone trying to concatenate two lists of nodes, I used a solution from here: Using xslt get node value at X position

<xsl:for-each select="items[not(@save)]/text">
    <xsl:variable name="pos" select="position()" />
    <option>
        <xsl:attribute name="value"><xsl:value-of select="../../items[@save]/text[position() = $pos]" /></xsl:attribute>
        <xsl:value-of select="."/>
    </option>
</xsl:for-each>

This allowed me to combine two sets of nodes to create a HTML option list with values, with XSLT 1.0.

Community
  • 1
  • 1
Kevin Teljeur
  • 2,283
  • 1
  • 16
  • 14