XPath parsing failing with dom4j for text function

Question

My input xml is

          String xml=  "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" +
            "<disks-array>\n" +
              "<array-item>\n" +
               " <value>\n" +
                  "<scsi>\n" +
                   "<bus>0</bus>\n" +
                    "<unit>0</unit>\n" +
                  "</scsi>\n" +
                  "<backing>\n" +
                   "<vmdk_file>[909_TCUP_02] u999orcat017t/u999orcat017t.vmdk</vmdk_file>\n" +
                    "<type>VMDK_FILE</type>\n" +
                  "</backing>\n" +
                  "<label>Hard disk 1</label>\n" +
                  "<type>SCSI</type>\n" +
                  "<capacity>107374182400</capacity>\n" +
                "</value>\n" +
                "<key>2000</key>\n" +
              "</array-item>\n" +
            "</disks-array>"

and the XPath filter is

"//array-item[contains(./value/backing/vmdk_file/text(),'u999orcat017t/u999orcat017t.vmdk')]"

Here is my parsing and filtering code

        Document doc = DocumentHelper.parseText(xml);

        XPath xp = DocumentHelper.createXPath(xpathQuery);

        // evaluate the xpath
        Object xpResult = xp.evaluate(doc);

Ideally it should return me the array items /value/vmdk_file text contains the given text. However it gives me empty string.

I am using dom4j 1.61 and jaxen 1.1.1 version library.

What is going wrong ?

Is this related to: https://stackoverflow.com/a/3655588/12031739 — Soc, Sep 24 '19 at 04:40
Try leaving out the `/text()`. Generally, testing the string value of an element is more robust than examining its text nodes individually. I can't see what's wrong here, but because of the way you've presented the XML, all might not be quite what it seems. — Michael Kay, Sep 24 '19 at 07:41

score 0 · Answer 1 · edited Jun 20 '20 at 09:12

0

Finally after debugging for many hours figured out the root cause for incorrect parsing of xml. The text value is broken into multiple nodes instead of single node. See the highlighted picture

Turns out this is a bug in dom4j library which is still open

https://github.com/dom4j/dom4j/issues/21

The fix is to call document.normalize() to settle text nodes.

edited Jun 20 '20 at 09:12

Community

1
1

answered Sep 25 '19 at 11:42

asolanki

1,333
11
18

XPath parsing failing with dom4j for text function

1 Answers1