I would like to know how you can parse the content of an HTML block and at the same time sustain the order of the strings as they appear in the HTML document by using this (Hpple) wrapper which works with XPath expressions. The environment is iOS.
Example:
<html>
<body>
<div>
Lorem ipsum <a href="...">dolor</a> sit <b>amet,</b> consectetur
</div>
</body>
</html>
Let's say we want to parse all the strings inside the <div>
tag in the original order so that we get this result:
Lorem ipsum dolor sit amet, consectetur
The sticking point of this is sustaining the order of strings. It's easy to get all the direct content of <div>
as well as that of <a>
and <b>
seperately or at the same time using an XPath expression which however omits the order, so might result in putting the content of <a>
and <b>
in the end of the string.
How can you achieve this using an XPath expression with the mentioned wrapper?
Update:
One way to achieve this with the mentioned wrapper and platform (especially libxml2) seems to be the following XPath expression:
//div/descendant-or-self::*/text()
However the resulting elements are seperated and not delivered as one string so that they have to be concatenated manually.