-1

I have a XML document with this specitic structure :

<ul>
<li>
the 
<a href="http://www...">dog</a> 
is black
</li>
<li >
the
<a href="http://www....">cat</a>
is white
</li>
</ul>

But I have also this :

<ul>
<li>
the bird is blue
</li>
<li >
the
<a href="http://www....">frog</a>
</li>
</ul>

I don't know if there is a <a> in my <li> and where is it. I would like the XPath query to get sentences like "the dog is black", "the cat is white", "the bird is blue" and "the frog"

Thanks !

2 Answers2

0

If you're bound to XPath 1.0, you cannot get the sentences as separated tokens. You can get all text in all list elements using

//ul//text()

, but for the first HTML snippet this will return something like "the dog is black the cat is white".

If you need the sentences seperated, retrieve the list items and but the sentences together from outside XPath (eg. PHP, Java, ...; whatever you're using). How to do this differs from language to language, have a look at the reference or refine question / ask another question.

//ul/li

With XPath 2.0 you've got more luck and you can use one of these queries:

//ul/li/data(.)
//ul/li/string-join(.//text. ' ')

If the first one returns what you need use it, if there are problems with whitespace (whitespace handling is different for different implementations, but usually can be configured) go for the more flexible second query and adjust it as needed.

Jens Erat
  • 37,523
  • 16
  • 80
  • 96
0

Thanks for your repply, I use Xpath for an iOS application with an HTML Parser : hpple (https://github.com/topfunky/hpple) I think it use Xpath 1.0, because the log say me string-join function isn't recognized

//ul//text() 

works but he return one word per word, and not one line per line