-1

Im facing this problem:

I have a string containing some weird HTML stuff, like

String result = "<div id=\"foo\"><div class=\"bar\">xyz</div><div id=\"alert\"><strong>Foo Bar 2% foobar.</strong></div></div>"

(this string is even bigger than in this example. It contains a whole webpage.)

My problem is now:

  1. Find the line <div id="alert"><strong>Foo Bar 2% foobar.</strong></div>
  2. extract the number (digit) 2 out there (this could be [0-9]{1,3}).

My attempt:

String pattern = "<div id=\"alert\"><strong>(.+) (\\d{1,3})% (.+)</strong></div>";
Matcher matcher = Pattern.compile(pattern).matcher(result);
while(matcher.find()) {
    Log.i(TAG, "" + matcher.group();
}

But this does not throw the expected result (I would except: 2).

I mainly develop in PHP, so there it is no problem to handle (preg_match), but I don't know how to do this in Java.

Thanks!

webmonkey
  • 1,083
  • 1
  • 15
  • 33

1 Answers1

1

Use jsoup to extract content from html tags. Then you can use regex on the string extracted.

Download jsoup from

http://jsoup.org/download.

    String url = "<div id=\"foo\"><div class=\"bar\">xyz</div><div id=\"alert\"><strong>Foo Bar 2% foobar.</strong></div></div>";
    Document doc = Jsoup.parse(url);  
    Elements elements = doc.select("strong");
    String s= elements.text();
    Pattern p = Pattern.compile("[0-9]{1,3}");
    Matcher m = p.matcher(s); 
    while (m.find()) {
       String result =m.group();
    }
Raghunandan
  • 132,755
  • 26
  • 225
  • 256