2

I have two XML files. The first XML has a bunch of nodes that should be present in second XML as well. The second XML might have a few extra nodes as well. I need a Java based program that can automate this check - i.e. it should tell me that given two XML files, all the nodes of the first file is present in the second xml.

I am looking at Java + XMLUnit. However XMLUnit does not have a exact solution for this. Help please.

Thanks.

partha
  • 2,286
  • 5
  • 27
  • 37
  • http://stackoverflow.com/questions/141993/best-way-to-compare-2-xml-documents-in-java this didn't help? – Kazekage Gaara Jun 03 '12 at 16:34
  • I am afraid not. I believe what that link is trying to achieve is comparison of two XMLs which are expected to be equal. I am trying to assert that one XML is a subset of another one. Slightly different. But proving to be a tough one to crack. – partha Jun 03 '12 at 16:38
  • well, [this](http://stackoverflow.com/a/142004/828625) answer on that questions says that using that particular library you can convert your xmls into Strings, and then using String operations you can easily determine if one is the subset of another. – Kazekage Gaara Jun 03 '12 at 16:48
  • Are the two XML's totally different every time? Or are they instances of a common schema? If yours is the second case, you could create an xsd and use it to validate the files. – loscuropresagio Jun 03 '12 at 16:49
  • @KazekageGaara - I see where the misunderstanding is. The second xml in my case can have new nodes that might be interspersed within the original set of nodes that are there in the first XML. So, the string approach is not going to work. Tell me if I am missing something. – partha Jun 03 '12 at 16:56
  • @loscuropresagio - No. They are not totally different. They are largely same. But the second XML has a few extra nodes that can appear anywhere. I am not looking to confirm that they are semantically correct. So, running it against the common schema might not work. Please let me know if I am missing anything. – partha Jun 03 '12 at 16:58
  • @partha please check those answers again, open those links and read what is written in them. for starters, read [this](http://xmlunit.sourceforge.net/). – Kazekage Gaara Jun 03 '12 at 17:02
  • Can you give an example of your first XML and the second XML? It might make this more clear what you are tying to achieve. – kjp Jun 03 '12 at 17:18

2 Answers2

2

Here is a sample code from xmlunit.

One method there actually compares two XMLs and finds out the differences.

 public void testCompareToSkeletonXML() throws Exception {
        String myControlXML = "<location><street-address>22 any street</street-address><postcode>XY00 99Z</postcode></location>";
        String myTestXML = "<location><street-address>20 east cheap</street-address><postcode>EC3M 1EB</postcode></location>";
        DifferenceListener myDifferenceListener = new IgnoreTextAndAttributeValuesDifferenceListener();
        Diff myDiff = new Diff(myControlXML, myTestXML);
        myDiff.overrideDifferenceListener(myDifferenceListener);
        assertTrue("test XML matches control skeleton XML " + myDiff, myDiff.similar());
    }

You can compare one XML against the other(keeping one as skeletal XML) to find if one is the subset of other.

If that way isn't satisfactory, there is yet another method finding all differences between given two XMLs.

 public void testAllDifferences() throws Exception {
        String myControlXML = "<news><item id=\"1\">War</item>"
            + "<item id=\"2\">Plague</item><item id=\"3\">Famine</item></news>";
        String myTestXML = "<news><item id=\"1\">Peace</item>"
            + "<item id=\"2\">Health</item><item id=\"3\">Plenty</item></news>";
        DetailedDiff myDiff = new DetailedDiff(compareXML(myControlXML, myTestXML));
        List allDifferences = myDiff.getAllDifferences();
        assertEquals(myDiff.toString(), 0, allDifferences.size());
    }

See the docs of XMLUnit for more.

Kazekage Gaara
  • 14,972
  • 14
  • 61
  • 108
1

First things first. Let me go on record and say that XMLUnit is a gem. I loved it. If you are looking at some unit testing of XML values / attributes / structure etc. chances are that you will find a readymade solution with XMLUnit. This is a good place to start from.

It is quite extensible. It already comes with an identity check (as in the XMLs have the same elements and attributes in the same order) or similarity check (as in the XMLs have the same elements and attributes regardless of the order).

However, in my case I was looking for a slightly different usage. I had a big-ish XML (a few hundred nodes), and a bunch of XML files (around 350,000 of them). I needed to not compare certain particular nodes, that I could identify with XPATH. They were not necessarily always in the same position in the XML but there were some generic way of identifying them with XPATH. Sometimes, some nodes were to be ignored based on values of some other nodes. Just to give some idea

  1. The logic here is on the node that I want to ignore i.e price. /bookstore/book[price>35]/price

  2. The logic here is on a node that is at a relative position. I want to ignore author based on the value of price. And these two are related by position. /bookstore/book[price=30]/./author

After much tinkering around, I settled for a low tech solution. Before using XMLUnit to compare the files, I used XPATH to mask the values of the nodes that were to be ignored.

    public static int massageData(File xmlFile, Set<String> xpaths, String mask)
        throws JDOMException, IOException {
    logger.debug("Data massaging started for " + xmlFile.getAbsolutePath());
    int counter = 0;

    Document doc = (Document) new SAXBuilder().build(xmlFile
            .getAbsolutePath());

    for (String xpath : xpaths) {
        logger.debug(xpath);
        XPathExpression<Element> xpathInstance = XPathFactory.instance()
                .compile(xpath, Filters.element());
        List<Element> elements = xpathInstance.evaluate(doc);
        // element = xpathInstance.evaluateFirst(doc);
        if (elements != null) {
            if (elements.size() > 1) {
                logger.warn("Multiple matches were found for " + xpath
                        + " in " + xmlFile.getAbsolutePath()
                        + ". This could be a *potential* error.");
            }
            for (Element element : elements) {
                logger.debug(element.getText());
                element.setText(mask);
                counter++;
            }
        }
    }

Hope this helps.

partha
  • 2,286
  • 5
  • 27
  • 37