2

Note: This question and its answer are valid for most/all programming languages and libraries that support XPath, not just JavaScript!

With the following code that creates a very simple HTML-page (the actual code loads a remote page, but I'm trying to put your focus on the main problem here):

var dt = document.implementation.createDocumentType("html", "-//W3C//DTD HTML 4.01 Transitional//EN", "http://www.w3.org/TR/html4/loose.dtd");
var doc = document.implementation.createDocument("http://www.w3.org/1999/xhtml", "html", dt);
var src = "<head></head><body></body>";
doc.documentElement.innerHTML = src;

alert(doc.evaluate(".", doc, null, XPathResult.ANY_UNORDERED_NODE_TYPE, null).singleNodeValue);
alert(doc.evaluate("/body", doc, null, XPathResult.ANY_UNORDERED_NODE_TYPE, null).singleNodeValue);
alert(doc.evaluate("//body", doc, null, XPathResult.ANY_UNORDERED_NODE_TYPE, null).singleNodeValue);
alert(doc.evaluate("/html", doc, null, XPathResult.ANY_UNORDERED_NODE_TYPE, null).singleNodeValue);

The first alert() shows "[object HTMLDocument]", the other alert() shows "null". Why is that? What am I missing to make XPath queries work and have it find the body-element?


EDIT:

  • added "//body" in the example
  • I guess I should mention that I use Opera 12.17. Is there any workaround that would lead me to the same result?
sjngm
  • 12,423
  • 14
  • 84
  • 114

1 Answers1

2

The first XPath selects the document root (. is the current context).

The second one is null because there is no body at the root context. You could use:

/html/body

or

//body

This will get you the nodes. From there you can get child nodes in context using contextual XPath expressions or DOM methods and properties. To see the node names you can use the nodeName property on the node you selected:

doc.evaluate(".", doc, null, XPathResult.ANY_UNORDERED_NODE_TYPE, null)
   .singleNodeValue.nodeName;
doc.evaluate("//body", doc, null, XPathResult.ANY_UNORDERED_NODE_TYPE, null)
   .singleNodeValue.nodeName;

JSFiddle 1

This alternative version uses DOM to create the nodes.

var head = document.createElement("head");
var body = document.createElement("body");
doc.documentElement.appendChild(head);
doc.documentElement.appendChild(body);

It also enforces a namespace (which is ignored in Chrome, in the first example), so the XPath expressions either need to include a namespace mapping function (as the third parameter of the evaluate method, or ignore them (using wildcards and local name testing as in the example below).

doc.evaluate(".//*[local-name()='body']", doc.documentElement, null, XPathResult.ANY_UNORDERED_NODE_TYPE, null).singleNodeValue.nodeName

Note that I also used doc.documentElement as the context node.

Try it in your browser:

JSFiddle 2

helderdarocha
  • 23,209
  • 4
  • 50
  • 65
  • Well, "//body" also doesn't work for me, but I guess this might depend on the browser. I'm using Opera 12.17 with no plan to update any time soon. Damn, I hate browser dependencies... – sjngm Jun 08 '14 at 19:01
  • That's strange. If you got `[object HTMLDocument]` when evaluating the root XPath must be working, since that is correct. Did the Fiddle work in your browser? – helderdarocha Jun 08 '14 at 19:03
  • Nope, it broke at the fourth `alert()` as it returned "null". – sjngm Jun 08 '14 at 19:04
  • You could also try `/html/body` or `html/body`. The third alert should have returned `[object HTMLElement]`. If it did, then XPath works, and there may be a chance of getting the data using other apis – helderdarocha Jun 08 '14 at 19:07
  • Nah, "/html" and "html" return "null". I'm doomed :( – sjngm Jun 08 '14 at 19:17
  • I think the problem may not be with XPath. Since the root node was selected `evaluate` worked. It probably works with static HTML. The problem may be supporting `innerHTML`. A workaround would be to add the elements using W3C DOM (`createElement()`, etc.). – helderdarocha Jun 08 '14 at 19:18
  • I checked `innerHTML` and the content is there. BUT: your suggestion with the `local-name` actually worked!!!1!one! Ur awesomez!!! – sjngm Jun 08 '14 at 20:26
  • 1
    That's because the document creation actually creates a namespace ("http://www.w3.org/1999/xhtml"). When you use wildcard+local name selection you are actually *ignoring* the namespace. Another way to do it is to create a *namespace resolver* function which returns a prefix namespace given a *prefix* which you have to use for selection (e.g. `h:html/h:body` if you use `function(prefix) {if(prefix='h')return 'http://www.w3.org/1999/xhtml'}` as the third argument in `evaluate`. – helderdarocha Jun 08 '14 at 20:46
  • Occasionally I've been testing with a `function () { return "http://www.w3.org/1999/xhtml"; }`, but that never worked. But I never used prefixes. – sjngm Jun 08 '14 at 22:03