1

Am working with xml file (making its validation). I need to edit some attributes before calculating CRC32 function of entire file. I using DOM parser and xPath. After I have edited my file, am converting it to byte array for crc- function:

    Node file_crc = (Node) xPath.compile("/IODevice/Stamp/@crc").evaluate(doc, XPathConstants.NODE);
    file_crc.setTextContent("");
    bos = new ByteArrayOutputStream();
    result = new StreamResult(bos);
    try {
        transformer.transform(new DOMSource(doc), result);
        crc.reset();
        crc.update(bos.toByteArray());
    } catch (TransformerException ex) {
        return false;
    }

The trouble is that DOM parser changes attributes order in xml file (sorts them alphabeticaly) - this cause invalid checksum of file. How to avoid attributes order mutation?

Constantine
  • 1,802
  • 3
  • 23
  • 37

2 Answers2

4

The order of attributes is not significant in XML. Applications are free to store attributes of an element in any order they like. So, this behaviour is to be expected from DOM and XPath.

As far as I understand, CRC32 is ill-suited in the case of XML documents because documents as

<root a="1" b="2"/>

and

<root b="2" a="1"/>

are effectively the same. As a rule, you should not write XML applications that treat attribute order as significant, because there is no way to control that. If anything, attributes and namespace declarations should be listed in "ascending lexicographic order" (xml.com).


The relevant piece of info from the XML specification says:

Note that the order of attribute specifications in a start-tag or empty-element tag is not significant.

Perhaps you'll appreciate a link to some more opinions on this?

Community
  • 1
  • 1
Mathias Müller
  • 22,203
  • 13
  • 58
  • 75
  • It wasn't my decision to use such mechanism) Am just need to work around this issue. – Constantine Nov 01 '14 at 04:57
  • @KostyaKrivomaz How about parsing the original file into a DOM representation and serialize it without changing anything? Then, attributes might be in a lexicographic order and you can use that intermediate file for the checksum. – Mathias Müller Nov 01 '14 at 14:32
3

The order of attributes is defined to be insignificant in XML: no conformant XML application should produce results that depend on the order in which attributes appear. Therefore, serializers (code that produces lexical XML as output) will usually give you no control over the order.

Now, it would sometimes be nice to have that control for cosmetic reasons, because XML is designed to be human-readable. So there's a valid reason for wanting the feature. But the fact is, I know of no serializer that offers it.

John Jerik
  • 39
  • 2