I'm version controlling a bunch of XML files which are generated by third party applications. Unfortunately the files are often saved in a way which makes version control more cumbersome than it should be. They might swap the elements around:
<root>
- <b>bar</b>
<a>foo</a>
+ <b>bar</b>
</root>
or reorder attributes:
-<root a="foo" b="bar"/>
+<root b="bar" a="foo"/>
or change/remove indentation:
-<root a="foo" b="bar"/>
+<root
+ a="foo"
+ b="bar"/>
To be clear, these files do not mix text and element nodes (like <a>foo <b>bar</b></a>
), and there's no semantic difference between the differently ordered files, so it's safe to reorder them any way we want.
I've solved this partially by using xsltproc
and the following schema to sort elements:
<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
<output method="xml" indent="yes" encoding="UTF-8"/>
<strip-space elements="*"/>
<template match="processing-instruction()|@*">
<copy>
<apply-templates select="node()|@*"/>
</copy>
</template>
<template match="*">
<copy>
<apply-templates select="@*"/>
<apply-templates>
<sort select="name()"/>
<sort select="@*[1]"/>
<sort select="@*[2]"/>
<sort select="@*[3]"/>
<sort select="@*[4]"/>
<sort select="@*[5]"/>
<sort select="@*[6]"/>
</apply-templates>
</copy>
</template>
</stylesheet>
However, I've recently learned that attribute ordering is not defined, so ordering by the six "first" attributes won't work in general. And of course this doesn't sort the attributes.
(I've used "normalize" in the title because I don't necessarily want to sort the elements in some particular way, it just seemed like the most obvious way to make sure the textual difference between two semantically identical files is empty.)
Is there some way to achieve such ordering?
Despite the name, this is different from XSLT sort by tag name and attribute value. The question includes only a single attribute, and the accepted solution isn't sufficiently general.