0

My input xml is

          String xml=  "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" +
            "<disks-array>\n" +
              "<array-item>\n" +
               " <value>\n" +
                  "<scsi>\n" +
                   "<bus>0</bus>\n" +
                    "<unit>0</unit>\n" +
                  "</scsi>\n" +
                  "<backing>\n" +
                   "<vmdk_file>[909_TCUP_02] u999orcat017t/u999orcat017t.vmdk</vmdk_file>\n" +
                    "<type>VMDK_FILE</type>\n" +
                  "</backing>\n" +
                  "<label>Hard disk 1</label>\n" +
                  "<type>SCSI</type>\n" +
                  "<capacity>107374182400</capacity>\n" +
                "</value>\n" +
                "<key>2000</key>\n" +
              "</array-item>\n" +
            "</disks-array>"

and the XPath filter is

"//array-item[contains(./value/backing/vmdk_file/text(),'u999orcat017t/u999orcat017t.vmdk')]"

Here is my parsing and filtering code

        Document doc = DocumentHelper.parseText(xml);

        XPath xp = DocumentHelper.createXPath(xpathQuery);

        // evaluate the xpath
        Object xpResult = xp.evaluate(doc);

Ideally it should return me the array items /value/vmdk_file text contains the given text. However it gives me empty string.

I am using dom4j 1.61 and jaxen 1.1.1 version library.

What is going wrong ?

asolanki
  • 1,333
  • 11
  • 18
  • Is this related to: https://stackoverflow.com/a/3655588/12031739 – Soc Sep 24 '19 at 04:40
  • I tried removing `\n`? your xpath is working fine – Ed Bangga Sep 24 '19 at 05:05
  • Try leaving out the `/text()`. Generally, testing the string value of an element is more robust than examining its text nodes individually. I can't see what's wrong here, but because of the way you've presented the XML, all might not be quite what it seems. – Michael Kay Sep 24 '19 at 07:41
  • I've fixed the xml, It is concatenated strings to form xml. – asolanki Sep 24 '19 at 08:29

1 Answers1

0

Finally after debugging for many hours figured out the root cause for incorrect parsing of xml. The text value is broken into multiple nodes instead of single node. See the highlighted picture

enter image description here

Turns out this is a bug in dom4j library which is still open

https://github.com/dom4j/dom4j/issues/21

The fix is to call document.normalize() to settle text nodes.

Community
  • 1
  • 1
asolanki
  • 1,333
  • 11
  • 18