0

Hi guys I'm trying to select the tag "< b r / >" in the HTML file and it's not working here is source of the site:

</div><p><a href="http://www.pinoyfitness.com/wp-content/uploads/2014/03/sofitel-manila-half-marathon-2014-poster.jpg"><img src="http://www.pinoyfitness.com/wp-content/uploads/2014/03/sofitel-manila-half-marathon-2014-poster-540x783.jpg" alt="sofitel-manila-half-marathon-2014-poster" width="540" height="783" class="aligncenter size-medium wp-image-32747" /></a></p>
<p>Introducing the Manila Half Marathon happening on August 17, 2014 at the SM Mall of Asia Grounds. This race is for the benefit of the children of <a href="http://www.virlanie.org/" rel="nofollow" target="_blank">Virlanie</a></p>
 <p><font size="3"><strong>Sofitel Manila Half-Marathon 2014</strong></font><br />
August 17, 2014 @ 3AM<br />
SM Mall of Asia<br />
5K/10K/21K<br />
Organizer: RunRio</p>
<p><strong>Registration Fees:</strong><br />
21K &#8211; P950<br />
10K &#8211; P850<br />
5K &#8211; P750</p>

here is my work so far:

doc = Jsoup.connect("http://www.pinoyfitness.com/2014/03/manila-half-marathon-august-17-2014/").timeout(0).get();
            Element bod = doc.body();
            Elements info = bod.select("br");
            String textString = info.text();

            System.out.println(textString);

I'm trying to retrieve the html code with the " < b r / >" so that I can easily split them and format them.

but it when I select the element "P" it prints all the texts not including "< b r / >" like this "Introducing the Manila Half Marathon happening on August 17, 2014 at the SM Mall of Asia Grounds. This race is for the benefit of the children of Virlanie Sofitel Manila Half-Marathon 2014 August 17, 2014 @ 3AM SM Mall of Asia 5K/10K/21K Organizer"

I'm new at JSOUP so please go easy on me if a have a newbee error or something like that. Thanks in advance.

user3797088
  • 563
  • 3
  • 7
  • 16
  • The tag `
    ` doesnt seem that useful. Exactly what content are you trying to retrieve?
    – Reimeus Aug 24 '14 at 12:51
  • I'm trying to retrieve something like this: introducing the Manila Half Marathon happening on August 17, 2014 at the SM Mall of Asia Grounds. This race is for the benefit of the children of Sofitel Manila Half-Marathon 2014
    " so I can split them using
    , is that right? @Reimeus
    – user3797088 Aug 24 '14 at 12:58
  • Please add that to the question itself - use the [Edit](http://stackoverflow.com/posts/25471758/edit) link – Reimeus Aug 24 '14 at 12:59
  • @Reimeus there I added it :D – user3797088 Aug 24 '14 at 13:08

1 Answers1

1

If you want to preserve the <br/> tags in the parsed content, a somewhat simplistic solution to your problem would be to replace all <br/> tags in the original HTML code with text placeholders (a handy regexp to do it from here):

html.replaceAll("(?i)<br[^>]*>", "br2n")

Then you could do textString.split("br2n") if this is what you've been trying to achieve.

Community
  • 1
  • 1