This chapter in my XSLT saga is an extension of the question here. Thanks to all of you who have helped me get this far (@Martin Honnen, @Ian Roberts, @Tim C, and anyone else I missed)!
Here is my current problem:
- I reorder some siblings in
A_v1.xml
to createA_v2.xml
. I now consider these two files to be different "versions" of the same file. The files two files have the exact same content, only some siblings are in a different order. Another way of saying it, each element inA_v2.xml
still has the same parent as it did inA_v1.xml
, but it may now occur before siblings it used to occur after, or may occur after siblings it used to occur before. - I transform
A_v1.xml
intoA_v1_transformed.xml
- I transform
A_v2.xml
intoA_v2_transformed.xml
- I compare
A_v1_transformed.xml
toA_v2_transformed.xml
and to my dismay they are not identical. Further more neither of them are in the expected order shown inexpected.xml
. They have the same content, but the elements are not sorted in the same order.
My first sort is <xsl:sort select="local-name()"/>
. @G. Ken Holman turned me onto <xsl:sort select="."/>
(which has the same effect as <xsl:sort select="self::*"/>
which I was using). When I use those two sorts in combination I get almost exactly what I want, but in some places it seems the expected alphabetical order is just randomly broken.
I have beefed up my sample files. To keep the question short I just put them on pastebin.
Here is one of the transformed files with comments added by me to help you understand where/why I think the transform sorted these files incorrectly. I didn't comment the other transformed file because it has similar "failures".
A_v1_transformed_with_comments.xml
Both of the transformed documents should have the same checksum as expected.xml
, but they don't. That is my biggest concern. Alphabetical sorting seems the most sane way to sort, but so long as the transform sorted in some sane way I couldn't care less how the sort happened so long as the sort is repeatable among different "versions" of the same file.
The following XLS files both yield the same result, but the "multi-pass" version may be easier to understand.
Points for discussion:
- I have noticed that when sorting alphabetically CAPITALIZED letters take precedence. Even if the capitalized letter comes after a lower case letter alphabetically it will come first in the sort.
Partial success...
I think I may have stumbled onto a partial solution myself, but I am unclear why it works. If you look at my xsl_multi_pass.xsl
file you will see:
<!-- Third pass with sortElements mode templates -->
<xsl:variable name="sortElementsRslt">
<xsl:apply-templates mode="sortElements" select="$sortAttributesRslt"/>
</xsl:variable>
<!-- Fourth pass with deDup mode templates -->
<xsl:apply-templates mode="deDup" select="$sortElementsRslt"/>
If I turn that into:
<!-- Third pass with sortElements mode templates -->
<xsl:variable name="sortElementsRslt1">
<xsl:apply-templates mode="sortElements" select="$sortAttributesRslt"/>
</xsl:variable>
<!-- Fourth pass with sortElements mode templates -->
<xsl:variable name="sortElementsRslt2">
<xsl:apply-templates mode="sortElements" select="$sortElementsRslt1"/>
</xsl:variable>
<!-- Fifth pass with deDup mode templates -->
<xsl:apply-templates mode="deDup" select="$sortElementsRslt2"/>
This sorts the elements twice, I don't know why it is necessary. The result using the example files I have provided is what I expected minus the CAPITALIZED letters taking precedence, but that doesn't bother me so long as the result is consistent which it appears to be. The problem is that this "solution" causes another part of the real files I'm working with to be sorted inconsistently.
SUCCESS!
I think I finally got this working 100% how I want. I incorporated the function given in the answer here by @Dimitre Novatchev to elements by their attribute names and values. I still have to perform two passes to sort the elements (applying the exact same templates twice) as I described above for some reason, but it only takes an extra 3 seconds on a 20MB file, so I'm not too worried about it.
Here is the final result: