xPath expression: Getting elements even if they don't exist

Question

I have this xPath expression that I'm putting into htmlCleaner:

 //table[@class='StandardTable']/tbody/tr[position()>1]/td[2]/a/img

Now, my issue is that it changes, and some times the /a/img element is not present. So I would like an expression that gets all elements

//table[@class='StandardTable']/tbody/tr[position()>1]/td[2]/a/img

when /a/img is present, and

//table[@class='StandardTable']/tbody/tr[position()>1]/td[2]

when /a/img is not present.

Does anyone hav any idea how to do this? I found in another question something that looks like it might help me

descendant-or-self::*[self::body or self::span/parent::body]

but I don't understand it.

score 4 · Answer 1 · answered Dec 19 '11 at 21:13

4

Use:

 (//table[@class='StandardTable']
     /tbody/tr)
         [position()>1]
                   /td[2]
                       [not(a/img)] 

|

 (//table[@class='StandardTable']
     /tbody/tr)
         [position()>1]
                   /td[2]
                      /a/img

In general, if we want to select one node-set ($ns1) when some condition $cond is true and to select another node-set ($ns2) otherwise, this can be specified with the following single XPath expression:

$ns1[$cond] | $ns2[not($cond)]

In this particular case, ns1 is:

 (//table[@class='StandardTable']
     /tbody/tr)
         [position()>1]
                   /td[2]
                      /a/img

and ns2 is:

 (//table[@class='StandardTable']
     /tbody/tr)
         [position()>1]
                   /td[2]

And $cond is:

boolean( (//table[@class='StandardTable']
         /tbody/tr)
             [position()>1]
                       /td[2]
                          /a/img
        )

answered Dec 19 '11 at 21:13

Dimitre Novatchev

240,661
26
293
431

It keeps giving me an XPatherexception: **Unknown Function not** – Nacht Dec 19 '11 at 21:27
1

@Nacht: The "it" isn't a compliant XPath implementation. `not()` is a standard XPath function: http://www.w3.org/TR/1999/REC-xpath-19991116/#function-not – Dimitre Novatchev Dec 19 '11 at 21:34
Yeah, just found out that htmlCleaner doesn't do Boolean off the bat, you have to call in another function called "evaluateFunction". And, as always with htmlCleaner, no doc on it. -_- – Nacht Dec 19 '11 at 21:49
1

@Nacht: then, you need just to substitute `not()` from my solution with whatever htmlCleaner accepts. Please, let me know if this final solution works for you. – Dimitre Novatchev Dec 19 '11 at 21:59
1

@Nacht - Better yet, just convert HTMLCleaner's output to a W3C `Document` and treat it as though it were XML from the beginning. See my updated answer. Also, +1 for Dimitre beating me to the XPath solution. – Wayne Dec 19 '11 at 22:09

Wayne · Accepted Answer · 2011-12-19T22:12:33.777

You can select the union of two mutually exclusive expressions (notice the | union operator):

//table[@class='StandardTable']/tbody/tr[position()>1]/td[2]/a/img|
//table[@class='StandardTable']/tbody/tr[position()>1]/td[2][not(a/img)]

When the first expression returns nodes, the second one will not (and the other way around), which means you'll always get just the required nodes.

From your comments on @Dimitre's answer, I see that HTMLCleaner doesn't fully support XPath 1.0. You don't really need it to. You just need HTMLCleaner to parse input that isn't well-formed. Once it has done that job, convert its output into a standard org.w3c.dom.Document and treat it as XML.

Here's a conversion example:

TagNode tagNode = new HtmlCleaner().clean("<html><div><p>test");
Document doc = new DomSerializer(new CleanerProperties()).createDOM(tagNode);

From here on out, just use JAXP with whatever implementation you want:

XPath xpath = XPathFactory.newInstance().newXPath();
Node node = (Node) xpath.evaluate("/html/body/div/p[not(child::*)]", 
                       doc, XPathConstants.NODE);
System.out.println(node.getTextContent());

Output:

test

Thanks for this. I can't confirm that this worked or not since I am no longer working on this (the rest of the program works and I was told to move on to other things). I will come back and add this when I have time. Till then, thanks! — Nacht, Dec 20 '11 at 15:49

score 0 · Answer 3 · answered Dec 19 '11 at 21:14

0

This is ugly and it may not even work, but the principle should:

//table[@class='StandardTable']/tbody/tr[position()>1]/td[2][exists( /a/img )]/a/img | //table[@class='StandardTable']/tbody/tr[position()>1]/td[2][not( exists( /a/img ) )]

answered Dec 19 '11 at 21:14

biziclop

48,926
12
77
104

xPath expression: Getting elements even if they don't exist

3 Answers3

Linked