1

I have two types of strings:

1) "bla bla <a>interesting</a> bla bzzz"
2) "bla bla <b>interesting bla bzzz"

What I need is to capture the "interesting" substring, preferably using one pattern.

So far I have

public static void main(String[] args) {
    Pattern pattern = Pattern.compile("(<a>(.*?)</a>)|(<b>(.*?))");
    String message = "bzzzzzz <a>AaA</a>efwef<b>BbB";

    Matcher matcher = pattern.matcher(message);
    while (matcher.find()) {
        for (int i = 1; i <= matcher.groupCount(); i++) {
            System.out.println(matcher.group(i));
        }
    }
}

The result I would like is

AaA
BbB

But instead I'm getting

<a>AaA</a>
AaA
null
null
null
null
<b>

Any ideas? Thanks

Radek Skokan
  • 1,358
  • 2
  • 15
  • 38

1 Answers1

2
<b>(.*?)

will always match <b> and nothing else because .*? matches the empty string and doesn't try to match more than that if it doesn't have to. Also, your regex has way more capturing parentheses than you need.

Try

Pattern pattern = Pattern.compile("<a>(.*?)</a>|<b>(\\S*)");

The second half of this pattern matches a sequence of non-whitespace characters (\S) after <b>.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Just to add: some checks are needed to figure out which one to print as output. – nhahtdh May 17 '13 at 09:24
  • Thanks Tim! That solves my major issue. Still between the AaA and BbB are 2 nulls, but I can live with that. Thanks – Radek Skokan May 17 '13 at 09:29
  • 1
    @RadekS: Those nulls come from the fact that only one of the alternatives can match, but you're iterating over both capturing groups, one of which will therefore always not have participated in the match. – Tim Pietzcker May 17 '13 at 09:31