0

This chapter in my XSLT saga is an extension of the question here. Thanks to all of you who have helped me get this far (@Martin Honnen, @Ian Roberts, @Tim C, and anyone else I missed)!

Here is my current problem:

  1. I reorder some siblings in A_v1.xml to create A_v2.xml. I now consider these two files to be different "versions" of the same file. The files two files have the exact same content, only some siblings are in a different order. Another way of saying it, each element in A_v2.xml still has the same parent as it did in A_v1.xml, but it may now occur before siblings it used to occur after, or may occur after siblings it used to occur before.
  2. I transform A_v1.xml into A_v1_transformed.xml
  3. I transform A_v2.xml into A_v2_transformed.xml
  4. I compare A_v1_transformed.xml to A_v2_transformed.xml and to my dismay they are not identical. Further more neither of them are in the expected order shown in expected.xml. They have the same content, but the elements are not sorted in the same order.

My first sort is <xsl:sort select="local-name()"/>. @G. Ken Holman turned me onto <xsl:sort select="."/> (which has the same effect as <xsl:sort select="self::*"/> which I was using). When I use those two sorts in combination I get almost exactly what I want, but in some places it seems the expected alphabetical order is just randomly broken.

I have beefed up my sample files. To keep the question short I just put them on pastebin.

A_v1.xml

A_v2.xml

A_v1_transformed.xml

A_v2_transformed.xml

Here is one of the transformed files with comments added by me to help you understand where/why I think the transform sorted these files incorrectly. I didn't comment the other transformed file because it has similar "failures".

A_v1_transformed_with_comments.xml

Both of the transformed documents should have the same checksum as expected.xml, but they don't. That is my biggest concern. Alphabetical sorting seems the most sane way to sort, but so long as the transform sorted in some sane way I couldn't care less how the sort happened so long as the sort is repeatable among different "versions" of the same file.

expected.xml

The following XLS files both yield the same result, but the "multi-pass" version may be easier to understand.

xsl_concise.xsl

xsl_multi_pass.xsl

Points for discussion:

  1. I have noticed that when sorting alphabetically CAPITALIZED letters take precedence. Even if the capitalized letter comes after a lower case letter alphabetically it will come first in the sort.

Partial success...

I think I may have stumbled onto a partial solution myself, but I am unclear why it works. If you look at my xsl_multi_pass.xsl file you will see:

    <!-- Third pass with sortElements mode templates -->
    <xsl:variable name="sortElementsRslt">
        <xsl:apply-templates mode="sortElements" select="$sortAttributesRslt"/>
    </xsl:variable>

    <!-- Fourth pass with deDup mode templates -->
    <xsl:apply-templates mode="deDup" select="$sortElementsRslt"/>

If I turn that into:

    <!-- Third pass with sortElements mode templates -->
    <xsl:variable name="sortElementsRslt1">
        <xsl:apply-templates mode="sortElements" select="$sortAttributesRslt"/>
    </xsl:variable>

    <!-- Fourth pass with sortElements mode templates -->
    <xsl:variable name="sortElementsRslt2">
        <xsl:apply-templates mode="sortElements" select="$sortElementsRslt1"/>
    </xsl:variable>

    <!-- Fifth pass with deDup mode templates -->
    <xsl:apply-templates mode="deDup" select="$sortElementsRslt2"/>

This sorts the elements twice, I don't know why it is necessary. The result using the example files I have provided is what I expected minus the CAPITALIZED letters taking precedence, but that doesn't bother me so long as the result is consistent which it appears to be. The problem is that this "solution" causes another part of the real files I'm working with to be sorted inconsistently.

SUCCESS!

I think I finally got this working 100% how I want. I incorporated the function given in the answer here by @Dimitre Novatchev to elements by their attribute names and values. I still have to perform two passes to sort the elements (applying the exact same templates twice) as I described above for some reason, but it only takes an extra 3 seconds on a 20MB file, so I'm not too worried about it.

Here is the final result:

xsl_2.0_full_document_sorter.xsl

Community
  • 1
  • 1
ubiquibacon
  • 10,451
  • 28
  • 109
  • 179

2 Answers2

1

In a nutshell my ultimate goal with all of my XSLT questions is a stylesheet that when applied to a file will always generate the same result even if run on different "versions" of a that file. A different "version" of a file would be one that had the exact same content, just in a different order. That means an element's attributes may have been moved around and that elements may have occur eariler/later than they previously did.

Have you considered a different tool rather than XSLT for this purpose? The goal you've described sounds to me pretty much exactly the definition of similar() in XMLUnit

// control and test are the two XML documents you want to compare, they can
// be String, Reader, org.w3c.dom.Document or org.xml.sax.InputSource
Diff d = new Diff(control, test);
assert d.similar();
Ian Roberts
  • 120,891
  • 16
  • 170
  • 183
  • I was looking for different tools, but I didn't know about XMLUnit. If it can diff two files based on my description of equality then it could get me by for now, but eventually I will need something that can take various "versions" of a file that are equal, but sorted differently, and transform each of those versions into into a canonicalized and sorted output. The canonicalized and sorted output would for each version of a file would be binary equivalent of each other with each having the same checksum. – ubiquibacon Sep 20 '13 at 12:55
  • I have updated my question and tried to clarify some points while also providing better examples of what I am attempting to accomplish. In the mean time I'll check out XMLUnit. – ubiquibacon Sep 20 '13 at 15:53
  • This really wasn't the answer I was looking for, but seeing that it is the only answer (other than my own) I'll reward you the bounty. At least I know about XMLUnit now! – ubiquibacon Sep 23 '13 at 17:29
0

SUCCESS!

I think I finally got this working 100% how I want. I incorporated the function given in the answer here by @Dimitre Novatchev to sort elements by their attribute names and values. I still have to perform two passes to sort the elements (applying the exact same templates twice) as I described above for some reason, but it only takes an extra 3 seconds on a 20MB file, so I'm not too worried about it.

Here is the final result:

xsl_2.0_full_document_sorter.xsl

This transform is 100% generic and should be able to be used on any XML document to sort it in what I would consider the most sane way possible. The major benefit of this stylesheet is that it will transform multiple files that have the same content in different orders the exact same way, to the transformed results of all the files that have the same content will be identical.

Community
  • 1
  • 1
ubiquibacon
  • 10,451
  • 28
  • 109
  • 179