-3

I am using stax to parse an XML containg HTML and custom tags in java.

The XML looks like this

<html><div>Hello World</div><div><br /></div>
<div><br />
<Resource type="audio/m4a" height="72.00" id="lh6rde3c1d39148804cea99b054f4cc4bb990" width="72.00" />
<br /><br /></div>
<div><br />
</div><div>asfasdfasdfasdf</div><div><br /></div><div><br /></div><div><b>asdfasdfasdfasdf</b></div>
<div>
<b>adsfasdfasdf</b>
</div><div><b><br /></b></div><div><b><i>sdfasdfasdfas</i></b></div><div><i><b>asdfasdfasdfasdf</b>asdfasdfasdfasdf</i>
</div>
<Resource type="video/mp4" height="72.00" id="lh6rde3c1d39148804cesdfd2454f4cc4bb990" width="72.00" />
<div><i>asdfasdfasdfasdfasdf</i></div>
<div><ol><li><i>one</i></li><li><i>wto</i></li><li><i>three</i></li></ol><div>
<i>
asdfasdfasdfasdf</i>
</div><div>
<ul><li><i>one </i></li><li><i>thwo</i></li><li><i>three</i></li></ul></div>
</div></html>

I only require the resource details(i.e the attributes) Is there any other better option available in terms of parsing speed.

Linesh Mohan
  • 79
  • 3
  • 9

1 Answers1

0

I have no idea what the circumstances of your XML interpretation are, so this answer will be limited.

However, I can tell you that classically SAX and JAXP have been used; they don't strictly require a DTD, and with some clever enumerations you can parse just about anything.

JSoup, as mentioned by Rafael Cardoso, is generally an HTML parser, not an HTML-in-XML parser; but it may work for you. If all you're looking for are the attributes to a specific tag, along with (presumably) associated data, then the JDK may have all that you need.

We also have JDOM, DOM4J, and a bunch of others, all of which have their strengths and weaknesses. This question, thus, isn't particularly constructive, and is basically a duplicate of this one; which you might take a look at.

I recommend looking at this tutorial; which explains how to build a parser with the standard library.

halfer
  • 19,824
  • 17
  • 99
  • 186
Michael Macha
  • 1,729
  • 1
  • 16
  • 25