RegEx matching in HTML string

Question

Im facing this problem:

I have a string containing some weird HTML stuff, like

String result = "<div id=\"foo\"><div class=\"bar\">xyz</div><div id=\"alert\"><strong>Foo Bar 2% foobar.</strong></div></div>"

(this string is even bigger than in this example. It contains a whole webpage.)

My problem is now:

Find the line <div id="alert"><strong>Foo Bar 2% foobar.</strong></div>
extract the number (digit) 2 out there (this could be [0-9]{1,3}).

My attempt:

String pattern = "<div id=\"alert\"><strong>(.+) (\\d{1,3})% (.+)</strong></div>";
Matcher matcher = Pattern.compile(pattern).matcher(result);
while(matcher.find()) {
    Log.i(TAG, "" + matcher.group();
}

But this does not throw the expected result (I would except: 2).

I mainly develop in PHP, so there it is no problem to handle (preg_match), but I don't know how to do this in Java.

Thanks!

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — wtsang02, Jun 24 '13 at 13:56
`regex` is your problem..don't use it..use html parser to extract div tag's values and then extract digits using regex — Anirudha, Jun 24 '13 at 13:56
@Raghunandan Jsoup sounds interesting. Maybe I could get it working ;-) — webmonkey, Jun 24 '13 at 14:05
@Anirudh Yeah, thought so too, but I could not get a better idea for matching this. — webmonkey, Jun 24 '13 at 14:06

score 1 · Accepted Answer · answered Jun 24 '13 at 14:08

Use jsoup to extract content from html tags. Then you can use regex on the string extracted.

Download jsoup from

http://jsoup.org/download.

    String url = "<div id=\"foo\"><div class=\"bar\">xyz</div><div id=\"alert\"><strong>Foo Bar 2% foobar.</strong></div></div>";
    Document doc = Jsoup.parse(url);  
    Elements elements = doc.select("strong");
    String s= elements.text();
    Pattern p = Pattern.compile("[0-9]{1,3}");
    Matcher m = p.matcher(s); 
    while (m.find()) {
       String result =m.group();
    }

RegEx matching in HTML string

1 Answers1