Compare two documents where both parent elements and child elements are ordered diffently

Question

I'm trying to unit test some methods that produce xml. I have an expected xml string and the result string and after googling and searching stack overflow, I found XMLUnit. However it doesn't seem to handle one particular case where repeating elements in different orders contain elements that are in different orders. For example:

Expected XML:

<graph>
  <parent>
    <foo>David</foo>
    <bar>Rosalyn</bar>
  </parent>
  <parent>
    <bar>Alexander</bar>
    <foo>Linda</foo>
  </parent>
</graph>

Actual XML:

<graph>
  <parent>
    <foo>Linda</foo>
    <bar>Alexander</bar>
  </parent>
  <parent>
    <bar>Rosalyn</bar>
    <foo>David</foo>
  </parent>
</graph>

You can see the parent node repeats and it's contents can be in any order. These two xml pieces should be equivalent but nothing from the stackoverflow examples I've seen does the trick with this. (Best way to compare 2 XML documents in Java) (How can I compare two similar XML files in XMLUnit)

I've resorted to creating Documents from the xml strings, stepping through each expected parent node and then comparing it to each actual parent node to see if one of them is equivalent.

It seems to me like a lot of reinventing of the wheel for something that should be a relatively common comparison. XMLUnit seems to do a lot, perhaps I've missed something but from what I can tell, it falls short in this particular case.

Is there an easier/better way to do this?

My Solution:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setCoalescing(true);
dbf.setIgnoringElementContentWhitespace(true);
dbf.setIgnoringComments(true);
DocumentBuilder db = dbf.newDocumentBuilder();
// parse and normalize expected xml
Document expectedXMLDoc = db.parse(new ByteArrayInputStream(resultXML.getBytes()));
expectedXMLDoc.normalizeDocument();
// parse and normalize actual xml
Document actualXMLDoc = db.parse(new ByteArrayInputStream(actual.getXml().getBytes()));
actualXMLDoc.normalizeDocument();
// expected and actual parent nodes
NodeList expectedParentNodes = expectedXMLDoc.getLastChild().getChildNodes();
NodeList actualParentNodes = actualXMLDoc.getLastChild().getChildNodes();

// assert same amount of nodes in actual and expected
assertEquals("actual XML does not have expected amount of Parent nodes", expectedParentNodes.getLength(), actualParentNodes.getLength());

// loop through expected parent nodes
for(int i=0; i < expectedParentNodes.getLength(); i++) {
    // create doc from node
    Node expectedParentNode = expectedParentNodes.item(i);    
    Document expectedParentDoc = db.newDocument();
    Node importedExpectedNode = expectedParentDoc.importNode(expectedParentNode, true);
    expectedParentDoc.appendChild(importedExpectedNode);

    boolean hasSimilar = false;
    StringBuilder  messages = new StringBuilder();

    // for each expected parent, find a similar parent
    for(int j=0; j < actualParentNodes.getLength(); j++) {
        // create doc from node
        Node actualParentNode = actualParentNodes.item(j);
        Document actualParentDoc = db.newDocument();
        Node importedActualNode = actualParentDoc.importNode(actualParentNode, true);
        actualParentDoc.appendChild(importedActualNode);

        // XMLUnit Diff
        Diff diff = new Diff(expectedParentDoc, actualParentDoc);
        messages.append(diff.toString());
        boolean similar = diff.similar();
        if(similar) {
            hasSimilar = true;
        }
    }
    // assert it found a similar parent node
    assertTrue("expected and actual XML nodes are not equivalent " + messages, hasSimilar);        
}

Jim Garrison · Answer 1 · 2014-02-07T18:43:26.907

Use an XSL identity transform with an added <xsl:sort.../> to reorder the nodes in each document by name, then compare the sorted output. You may need to get a little tricky with specific sort keys for certain nodes (i.e. the top level parent nodes) to sort on inner contents.

Here's a skeleton to get you started:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes"/>

    <!-- Identity Transform -->
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()">
                <xsl:sort select="name(.)"/>
            </xsl:apply-templates>
        </xsl:copy>
    </xsl:template>

    <!-- Special handling for graph/parent nodes -->
    <xsl:template match="graph">
        <!-- Sort attributes using default above -->
        <xsl:apply-templates select="@*"/>
        <!-- Sort parent nodes by text of bar node -->
        <xsl:apply-templates select="parent">
            <xsl:sort select="bar/text()"/>
        </xsl:apply-templates>
    </xsl:template>
</xsl:stylesheet>

This works for the samples you posted. Adjust as necessary for the real data.

This seems like it will cut down on a lot of my code which I like, would I be able to create a generic enough xsl to be able to sort any xml document? — Munick, Feb 07 '14 at 18:42
You can't because you still have to resolve how to sort a series of elements with the same name. How to sort those will depend on the inner data that determines a unique sort order, and that will depend on the actual document structure. — Jim Garrison, Feb 07 '14 at 18:44

score 1 · Accepted Answer · answered Jul 09 '14 at 18:33

Just realized I hadn't selected an answer for this. I ended up using something very similar to my solution. Here's the final solution that worked for me. I've wrapped it up in a class to use with junit so the methods can be used like any other junit assertion.

If all children need to be in order, as in my case you can run

assertEquivalentXml(expectedXML, testXML, null, null);

If some nodes are expected to have children in random order and/or some attributes need to be ignored:

assertEquivalentXml(expectedXML, testXML,
                new String[]{"dataset", "categories"}, new String[]{"color", "anchorBorderColor", "anchorBgColor"});

Here's the class:

/**
 * A set of methods that assert XML equivalence specifically for XmlProvider classes. Extends 
 * <code>junit.framework.Assert</code>, meaning that these methods are recognised as assertions by junit.
 *
 * @author munick
 */
public class XmlProviderAssertions extends Assert {    

    /**
     * Asserts two xml strings are equivalent. Nodes are not expected to be in order. Order can be compared among the 
     * children of the top parent node by adding their names to nodesWithOrderedChildren 
     * (e.g. in <graph><dataset><set value="1"/><set value="2"/></dataset></graph> the top parent node is graph 
     * and we can expect the children of dataset to be in order by adding "dataset" to nodesWithOrderedChildren).
     * 
     * All attribute names and values are compared unless their name is in attributesToIgnore in which case only the 
     * name is compared and any difference in value is ignored.
     * 
     * @param expectedXML the expected xml string 
     * @param testXML the xml string being tested
     * @param nodesWithOrderedChildren names of nodes who's children should be in order
     * @param attributesToIgnore names of attributes who's values should be ignored
     */
    public static void assertEquivalentXml(String expectedXML, String testXML, String[] nodesWithOrderedChildren, String[] attributesToIgnore) {
        Set<String> setOfNodesWithOrderedChildren = new HashSet<String>();
        if(nodesWithOrderedChildren != null ) {
            Collections.addAll(setOfNodesWithOrderedChildren, nodesWithOrderedChildren);
        }

        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        dbf.setCoalescing(true);
        dbf.setIgnoringElementContentWhitespace(true);
        dbf.setIgnoringComments(true);
        DocumentBuilder db = null;
        try {
            db = dbf.newDocumentBuilder();
        } catch (ParserConfigurationException e) {
            fail("Error testing XML");
        }

        Document expectedXMLDoc = null;
        Document testXMLDoc = null;
        try {
            expectedXMLDoc = db.parse(new ByteArrayInputStream(expectedXML.getBytes()));
            expectedXMLDoc.normalizeDocument();

            testXMLDoc = db.parse(new ByteArrayInputStream(testXML.getBytes()));
            testXMLDoc.normalizeDocument();
        } catch (SAXException e) {
            fail("Could not parse testXML");
        } catch (IOException e) {
            fail("Could not read testXML");
        }
        NodeList expectedChildNodes = expectedXMLDoc.getLastChild().getChildNodes();
        NodeList testChildNodes = testXMLDoc.getLastChild().getChildNodes();

        assertEquals("Test XML does not have expected amount of child nodes", expectedChildNodes.getLength(), testChildNodes.getLength());

        //compare parent nodes        
        Document expectedDEDoc = getNodeAsDocument(expectedXMLDoc.getDocumentElement(), db, false);        
        Document testDEDoc = getNodeAsDocument(testXMLDoc.getDocumentElement(), db, false);
        Diff diff = new Diff(expectedDEDoc, testDEDoc);
        assertTrue("Test XML parent node doesn't match expected XML parent node. " + diff.toString(), diff.similar());

        // compare child nodes
        for(int i=0; i < expectedChildNodes.getLength(); i++) {
            // expected child node
            Node expectedChildNode = expectedChildNodes.item(i);
            // skip text nodes
            if( expectedChildNode.getNodeType() == Node.TEXT_NODE ) {
                continue;
            }
            // convert to document to use in Diff
            Document expectedChildDoc = getNodeAsDocument(expectedChildNode, db, true);

            boolean hasSimilar = false;
            StringBuilder  messages = new StringBuilder();

            for(int j=0; j < testChildNodes.getLength(); j++) {
                // find child node in test xml
                Node testChildNode = testChildNodes.item(j);
                // skip text nodes
                if( testChildNode.getNodeType() == Node.TEXT_NODE ) {
                    continue;
                }
                // create doc from node
                Document testChildDoc = getNodeAsDocument(testChildNode, db, true);

                diff = new Diff(expectedChildDoc, testChildDoc);
                // if it doesn't contain order specific nodes, then use the elem and attribute qualifier, otherwise use the default
                if( !setOfNodesWithOrderedChildren.contains( expectedChildDoc.getDocumentElement().getNodeName() ) ) {
                    diff.overrideElementQualifier(new ElementNameAndAttributeQualifier());
                }
                if(attributesToIgnore != null) {
                    diff.overrideDifferenceListener(new IgnoreNamedAttributesDifferenceListener(attributesToIgnore));
                }
                messages.append(diff.toString());
                boolean similar = diff.similar();
                if(similar) {
                    hasSimilar = true;
                }
            }
            assertTrue("Test XML does not match expected XML. " + messages, hasSimilar);
        }
    }

    private static Document getNodeAsDocument(Node node, DocumentBuilder db, boolean deep) {
        // create doc from node
        Document nodeDoc = db.newDocument();
        Node importedNode = nodeDoc.importNode(node, deep);
        nodeDoc.appendChild(importedNode);
        return nodeDoc;
    }

}

/**
 * Custom difference listener that ignores differences in attribute values for specified attribute names. Used to 
 * ignore color attribute differences in FusionChartXml equivalence.
 */
class IgnoreNamedAttributesDifferenceListener implements DifferenceListener {
    Set<String> attributeBlackList;

    public IgnoreNamedAttributesDifferenceListener(String[] attributeNames) {        
        attributeBlackList = new HashSet<String>();
        Collections.addAll(attributeBlackList, attributeNames);
    }

    public int differenceFound(Difference difference) {
        int differenceId = difference.getId();
        if (differenceId == DifferenceConstants.ATTR_VALUE_ID) {
            if(attributeBlackList.contains(difference.getControlNodeDetail().getNode().getNodeName())) {
                return DifferenceListener.RETURN_IGNORE_DIFFERENCE_NODES_IDENTICAL;
            }
        }

        return DifferenceListener.RETURN_ACCEPT_DIFFERENCE;
    }

    public void skippedComparison(Node node, Node node1) {
        // left empty
    }
}

score 0 · Answer 3 · answered Feb 07 '14 at 19:24

You can use a recursive function, so it can be used for any xml structures where the order of elements is not important, here is a pseudo-code:

public boolean isEqual(Node node1, Node node2)
{
    if nodes are not from the same type
        return false;
    if values of them are not the same
        return false;
    if size of their children are not the same
        return false;

    if they have no children
        return true;

    //compares each children of the node1 with the first child of node2
    for each child node of node1
        if(isEqual(node2.child(0), node)
        {
             matchFound = true;
             break;
        }

    if(!matchFound)
        return false;

    remove matched node from children of node1;
    remove matched node from children of node2;

    return isEqual(node1, node2)
}

This is essentially what XMLUnit does for me. It also handles checking attributes and tracking differences. — Munick, Feb 07 '14 at 19:37
Yes, XMLUnit should do it! But maybe some times you can debug your simple method more easily than a complicated library. If you want to find your problem with XMLUnit maybe overriding differenceFound[link](http://xmlunit.sourceforge.net/api/org/custommonkey/xmlunit/Diff.html#differenceFound%28org.custommonkey.xmlunit.Difference%29)method helps. — Mohammad Ali Bozorgzadeh, Feb 07 '14 at 20:00

Compare two documents where both parent elements and child elements are ordered diffently

3 Answers3