1

I am writing an Android App that will read some info from a website and display it on the App's screen. I am using the Jsoup library to get the info in the form of a string. First, here's what the website html looks like:

<strong>
   Now is the time<br />
   For all good men<br />
   To come to the aid<br />
   Of their country<br />
</strong>

Here's how I'm retrieving and trying to parse the text:

Document document = Jsoup.connect(WEBSITE_URL).get();
resultAggregator = "";

Elements nodePhysDon = document.select("strong");

//check results
if (nodePhysDon.size()> 0) {
   //get value
   donateResult = nodePhysDon.get(0).text();
   resultAggregator = donateResult;
}

if (resultAggregator != "") {
   // split resultAggregator into an array breaking up with br /
   String donateItems[] = resultAggregator.split("<br />");
}

But then donateItems[0] is not just "Now is the time", It's all four strings put together. I have also tried without the space between "br" and "/", and get the same result. If I do resultAggregator.split("br"); then donateItems[0] is just the first word: "Now".

I suspect the problem is the Jsoup method select is stripping the tags out?

Any suggestions? I can't change the website's html. I have to work with it as is.

Jungle Jim
  • 333
  • 1
  • 7
  • 16
  • 1
    possible duplicate of [How do I preserve line breaks when using jsoup to convert html to plain text?](http://stackoverflow.com/questions/5640334/how-do-i-preserve-line-breaks-when-using-jsoup-to-convert-html-to-plain-text) – luksch Sep 13 '15 at 08:38

1 Answers1

1

Try this:

//check results
if (nodePhysDon.size()> 0) {
   //use toString() to get the selected block with tags included
   donateResult = nodePhysDon.get(0).toString();
   resultAggregator = donateResult;
}

if (resultAggregator != "") {
// remove <strong> and </strong> tags
   resultAggregator = resultAggregator.replace("<strong>", "");
   resultAggregator = resultAggregator.replace("</strong>", "");
   //then split with <br>
   String donateItems[] = resultAggregator.split("<br>");
}

Make sure to split with <br> and not <br />

Joel Min
  • 3,387
  • 3
  • 19
  • 38