11

I am writing a Java program that reads an XML file, makes some modifications, and writes back the XML.

Using the standard Java XML DOM API, the order of the attributes is not preserved.

That is, if I have an input file such as:

<person first_name="john" last_name="lederrey"/>

I might get an output file as:

<person last_name="lederrey" first_name="john"/>

That's correct, because the XML specification says that order attribute is not significant.

However, my program needs to preserve the order of the attributes, so that a person can easily compare the input and output document with a diff tool.

One solution for that is to process the document with SAX (instead of DOM): Order of XML attributes after DOM processing

However, this does not work for my case, because the transformation I need to do in one node might depend on a XPath expression on the whole document.

So, the simplest thing would be to have a XML library very similar to the standard Java DOM library, with the exception that it preserves the attribute order.

Is there such a library?

PS: Please, avoid discussing whether I should the preserve attribute order or not. This is a very interesting discussion, but it is not the point of this question.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
David Portabella
  • 12,390
  • 27
  • 101
  • 182
  • Is DOM giving you random attribute order if you add the input the same way (except for the values of course)? It might not give you the order you want, but it would be strange if it gave you a random order. I mean the order might not be specified, but there will be some logic in it... – Balint Bako Jul 18 '13 at 14:16
  • I think there is probably no library that tells you the order of the attributes in your source XML file. However, you might consider controlling the output, so you could create the written attributes always in the same defined order (for example sorted by name). – obecker Jul 18 '13 at 16:10
  • @obecker, I do not have control on the input xml file, so the order of the attributes on the input xml are unknown, and I cannot force an order. – David Portabella Jul 18 '13 at 19:45
  • Would it be OK to canonicalize both documents before comparing them ? Just a thought... – GPI Mar 10 '16 at 09:22
  • Underscore-java library preserves attribute order while loading xml. – Valentyn Kolesnikov Mar 15 '20 at 04:46

7 Answers7

3

Saxon these days offers a serialization option[1] to control the order in which attributes are output. It doesn't retain the input order (because Saxon doesn't know the input order), but it does allow you to control, for example, that the ID attribute always appears first.

And this can be very useful if the XML is going to be hand-edited; XML in which the attributes appear in the "wrong" order can be very disorienting to a human reader or editor.

If you're using this as part of a diff process then you would want to put both files through a process that normalizes the attribute order before comparing them. However, for comparing files my preferred approach is to parse them both and use the XPath deep-equal() function; or to use a specialized tool like DeltaXML.

[1] saxon:attribute-order - see http://www.saxonica.com/documentation/index.html#!extensions/output-extras/serialization-parameters

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Michael Kay
  • 156,231
  • 11
  • 92
  • 164
2

Do it twice:

Read the document in using a DOM parser so you have references, a repository, if you will.

Then read it again using SAX. At the point where you need to make the transformation, reference the DOM version to determine what you need, then output what you need in the middle of the SAX stream.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Bob Dalgleish
  • 8,167
  • 4
  • 32
  • 42
  • that would be an option,but just to decide if i have to update or not the current node would be a pain in the ass(I need to modify an element only if its parent has three childs with some specific content,and one of the childs could be after the current node). however, this approach would be ok if there is a way to navigate from the current position in the SAX to its corresponding DOM node. (eg, when sax tell me that it starts a new node, i should be able to have a xpath expression that i can apply to the dom and get the corresponding node). is there such a helper function already implemented? – David Portabella Jul 19 '13 at 12:53
  • note, however, that i think it would be muuuuuch easier if we find out a library similar to the standard java DOM that just preserves order attribute. – David Portabella Jul 19 '13 at 12:54
2

You might also want to try DecentXML, as it can preserve the attribute order, comments and even indentation.

It is very nice if you need to programmatically update an XML file that's also supposed to be human-editable. We use it for one of our configuration tools.

-- edit --

It seems it is no longer available on its original location; try these ones:

Haroldo_OK
  • 6,612
  • 3
  • 43
  • 80
0

Your best bet would be to use StAX instead of DOM for generating the original document. StAX gives you a lot of fine control over these things and lets you stream output progressively to an output stream instead of holding it all in memory.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Mike Thomsen
  • 36,828
  • 10
  • 60
  • 83
0

We had similar requirements per Dave's description. A solution that worked was based on Java reflection.

The idea is to set the propOrder for the attributes at runtime. In our case there's APP_DATA element containing three attributes: app, key, and value. The generated AppData class includes "content" in propOrder and none of the other attributes:

@XmlAccessorType(XmlAccessType.FIELD)
@XmlType(name = "AppData", propOrder = {
    "content"
})
public class AppData {

    @XmlValue
    protected String content;
    @XmlAttribute(name = "Value", required = true)
    protected String value;
    @XmlAttribute(name = "Name", required = true)
    protected String name;
    @XmlAttribute(name = "App", required = true)
    protected String app;
    ...
}

So Java reflection was used as follows to set the order at runtime:

final String[] propOrder = { "app", "name", "value" };
ReflectionUtil.changeAnnotationValue(
        AppData.class.getAnnotation(XmlType.class),
        "propOrder", propOrder);

final JAXBContext jaxbContext = JAXBContext
        .newInstance(ADI.class);
final Marshaller adimarshaller = jaxbContext.createMarshaller();
adimarshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT,
        true);

adimarshaller.marshal(new JAXBElement<ADI>(new QName("ADI"),
                                           ADI.class, adi),
                      new StreamResult(fileOutputStream));

The changeAnnotationValue() was borrowed from this post: Modify a class definition's annotation string parameter at runtime

Here's the method for your convenience (credit goes to @assylias and @Balder):

/**
 * Changes the annotation value for the given key of the given annotation to newValue and returns
 * the previous value.
 */
@SuppressWarnings("unchecked")
public static Object changeAnnotationValue(Annotation annotation, String key, Object newValue) {
    Object handler = Proxy.getInvocationHandler(annotation);
    Field f;
    try {
        f = handler.getClass().getDeclaredField("memberValues");
    } catch (NoSuchFieldException | SecurityException e) {
        throw new IllegalStateException(e);
    }
    f.setAccessible(true);
    Map<String, Object> memberValues;
    try {
        memberValues = (Map<String, Object>) f.get(handler);
    } catch (IllegalArgumentException | IllegalAccessException e) {
        throw new IllegalStateException(e);
    }
    Object oldValue = memberValues.get(key);
    if (oldValue == null || oldValue.getClass() != newValue.getClass()) {
        throw new IllegalArgumentException();
    }
    memberValues.put(key, newValue);
    return oldValue;
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Erikson
  • 549
  • 1
  • 6
  • 16
0

You may override AttributeSortedMap and sort attributes as you need...

The main idea: load the document, recursively copy to elements that support sorted attributeMap and serialize using the existing XMLSerializer.

File test.xml

<root>
    <person first_name="john1" last_name="lederrey1"/>
    <person first_name="john2" last_name="lederrey2"/>
    <person first_name="john3" last_name="lederrey3"/>
    <person first_name="john4" last_name="lederrey4"/>
</root>

File AttOrderSorter.java

import com.sun.org.apache.xerces.internal.dom.AttrImpl;
import com.sun.org.apache.xerces.internal.dom.AttributeMap;
import com.sun.org.apache.xerces.internal.dom.CoreDocumentImpl;
import com.sun.org.apache.xerces.internal.dom.ElementImpl;
import com.sun.org.apache.xml.internal.serialize.OutputFormat;
import com.sun.org.apache.xml.internal.serialize.XMLSerializer;
import org.w3c.dom.*;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileWriter;
import java.io.Writer;
import java.util.List;

import static java.util.Arrays.asList;

public class AttOrderSorter {

    private List<String> sortAtts = asList("last_name", "first_name");

    public void format(String inFile, String outFile) throws Exception {
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = dbFactory.newDocumentBuilder();
        Document outDocument = builder.newDocument();
        try (FileInputStream inputStream = new FileInputStream(inFile)) {
            Document document = dbFactory.newDocumentBuilder().parse(inputStream);
            Element sourceRoot = document.getDocumentElement();
            Element outRoot = outDocument.createElementNS(sourceRoot.getNamespaceURI(), sourceRoot.getTagName());
            outDocument.appendChild(outRoot);

            copyAtts(sourceRoot.getAttributes(), outRoot);
            copyElement(sourceRoot.getChildNodes(), outRoot, outDocument);
        }

        try (Writer outxml = new FileWriter(new File(outFile))) {

            OutputFormat format = new OutputFormat();
            format.setLineWidth(0);
            format.setIndenting(false);
            format.setIndent(2);

            XMLSerializer serializer = new XMLSerializer(outxml, format);
            serializer.serialize(outDocument);
        }
    }

    private void copyElement(NodeList nodes, Element parent, Document document) {
        for (int i = 0; i < nodes.getLength(); i++) {
            Node node = nodes.item(i);
            if (node.getNodeType() == Node.ELEMENT_NODE) {
                Element element = new ElementImpl((CoreDocumentImpl) document, node.getNodeName()) {
                    @Override
                    public NamedNodeMap getAttributes() {
                        return new AttributeSortedMap(this, (AttributeMap) super.getAttributes());
                    }
                };
                copyAtts(node.getAttributes(), element);
                copyElement(node.getChildNodes(), element, document);

                parent.appendChild(element);
            }
        }
    }

    private void copyAtts(NamedNodeMap attributes, Element target) {
        for (int i = 0; i < attributes.getLength(); i++) {
            Node att = attributes.item(i);
            target.setAttribute(att.getNodeName(), att.getNodeValue());
        }
    }

    public class AttributeSortedMap extends AttributeMap {
        AttributeSortedMap(ElementImpl element, AttributeMap attributes) {
            super(element, attributes);
            nodes.sort((o1, o2) -> {
                AttrImpl att1 = (AttrImpl) o1;
                AttrImpl att2 = (AttrImpl) o2;

                Integer pos1 = sortAtts.indexOf(att1.getNodeName());
                Integer pos2 = sortAtts.indexOf(att2.getNodeName());
                if (pos1 > -1 && pos2 > -1) {
                    return pos1.compareTo(pos2);
                } else if (pos1 > -1 || pos2 > -1) {
                    return pos1 == -1 ? 1 : -1;
                }
                return att1.getNodeName().compareTo(att2.getNodeName());
            });
        }
    }

    public void main(String[] args) throws Exception {
        new AttOrderSorter().format("src/main/resources/test.xml", "src/main/resources/output.xml");
    }
}

Result - file output.xml

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <person last_name="lederrey1" first_name="john1"/>
  <person last_name="lederrey2" first_name="john2"/>
  <person last_name="lederrey3" first_name="john3"/>
  <person last_name="lederrey4" first_name="john4"/>
</root>
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
IvanNik
  • 2,007
  • 2
  • 13
  • 12
-1

You can't use the DOM, but you can use SAX, or querying children using XPath.

Visit the answer Order of XML attributes after DOM processing.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
fla
  • 143
  • 4