I need to pass some not strictly well-formatted XML through an XPath evaluator. The XML is in fact mostly html, which could like the following:
<p>
<a href="http://www.something.com/5993810749/" title="IMG_3013”>
<img src="5993810749_107ea7d465_m.jpg" width="240" height="160" alt="IMG_3013”/>
</a>
</p>
<p>
<a href="http://www.something.com/836492365986/" title="IMG_3018”>
<img src=“8364923659_107ea3286465_m.jpg" width=“365" height=“248" alt="IMG_3018”/>
</a>
</p>
So, the noticeable problems are that it: has no root element; Also <img>
is not terminated. While it is easy to wrap with a root element, when I pass through the XPath evaluator, I get an exception something like:
[Fatal Error] :7:196: The element type "img" must be terminated by the matching end-tag "</img>".
Btw, the code for the XPath Evaluator in Java looks like:
XPath xPath = XPathFactory.newInstance().newXPath();
Object result = xPath.evaluate(xpath,
new InputSource(new StringReader(xmlString)), XPathConstants.NODESET);
So, I would like to know, what is the best way to deal with this, so that I could successfully evaluate the XML? It seems I have at least two options: (a) try to get the XPath evaluator to be more smart; or (b) try having a way to automatically repair the poorly formatted XML. A solution to this problem would be appreciated!