How to parse HTML Heading

Question

I have this HTML i am parsing.

<div id="articleHeader">
<h1 class="headline">Assassin's Creed Revelations: The Three Heroes</h1>
<h2 class="subheadline">Exclusive videos and art spanning three eras of assassins.</h2>
<h2 class="publish-date"><script>showUSloc=(checkLocale('uk')||checkLocale('au'));document.writeln(showUSloc ? '<strong>US, </strong>' : '');</script>

<span class="us_details">September 22, 2011</span>

What i want to do it parse the "headline" subheadline and publish date all to seperate Strings

Check out this previously asked question: http://stackoverflow.com/questions/2188049/parse-html-in-android — slayton, Sep 23 '11 at 03:06

score 2 · Accepted Answer · answered Sep 23 '11 at 06:15

Just use the proper CSS selectors to grab them.

Document document = Jsoup.connect(url).get();
String headline = document.select("#articleHeader .headline").text();
String subheadline = document.select("#articleHeader .subheadline").text();
String us_details = document.select("#articleHeader .us_details").text();
// ...

Or a tad more efficient:

Document document = Jsoup.connect(url).get();
Element articleHeader = document.select("#articleHeader").first();
String headline = articleHeader.select(".headline").text();
String subheadline = articleHeader.select(".subheadline").text();
String us_details = articleHeader.select(".us_details").text();
// ...

score 0 · Answer 2 · answered Sep 23 '11 at 04:40

0

Android has a SAX parser built into it . You can use other standard XML parsers as well.

But I think if ur HTML is simple enough u could use RegEx to extract string.

answered Sep 23 '11 at 04:40

the100rabh

4,077
4
32
40

Regex? *Shudder.* Did you miss the `jsoup` tag? – BalusC Sep 23 '11 at 04:43
yup I did miss jsoup, and I like RegEx – the100rabh Sep 23 '11 at 06:07
I like regex too. But, to parse HTML? Totally the wrong tool. – BalusC Sep 23 '11 at 06:12
Not to parse but to get some specific data from text. its simpler that way at time – the100rabh Sep 23 '11 at 10:16

How to parse HTML Heading

2 Answers2