1

Given a string containing some number of square brackets and other characters, I want to find all closing square brackets preceded by an opening square bracket and some number of letters. For instance, if the string is

] [abc] [123] abc]

I want to find only the second closing bracket.

The following regex

(?<=[a-z]+)\]

will find me the second closing bracket, but also the last one:

] [abc] [123] abc]

Since I want to find only the first one, I make the obvious change to the regex...

(?<=\[[a-z]+)\]

...and I get "Look-behind group does not have an obvious maximum length near index 11."

\[ is only a single character, so it seems like the obvious maximum length should be 1 + whatever the obvious maximum length was of the look-behind group in the first expression. What gives?


ETA: It's not specific to the opening bracket.

(?<=a[b-z]+)\]

gives me the same error. (Well, at index 12.)

David Moles
  • 48,006
  • 27
  • 136
  • 235

2 Answers2

3

\[ is only a single character, so it seems like the obvious maximum length should be 1 + whatever the obvious maximum length was of the look-behind group in the first expression. What gives?

That's the point, "whatever the obvious maximum length was of the look-behind group in the first expression", is not obvious. A rule of fist is that you can't use + or * inside a look-behind. This is not only so for Java's regex engine, but for many more PCRE-flavored engines (even Perl's (v5.10) engine!).

You can do this with look-aheads however:

Pattern p = Pattern.compile("(?=(\\[[a-z]+]))");
Matcher m = p.matcher("] [abc] [123] abc]");
while(m.find()) {
  System.out.println("Found a ']' before index: " + m.end(1));
}

(I.e. a capture group inside a look ahead (!) which can be used to get the end(...) of the group)

will print:

Found a ']' before index: 7

EDIT

And if you're interested in replacing such ]'s, you could do something like this:

String s = "] [abc] [123] abc] [foo] bar]";
System.out.println(s);
System.out.println(s.replaceAll("(\\[[a-z]+)]", "$1_"));

which will print:

] [abc] [123] abc] [foo] bar]
] [abc_ [123] abc] [foo_ bar]
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • @David, probably a recurring bug in the regex-API (see: http://stackoverflow.com/questions/1536915/regex-look-behind-without-obvious-maximum-length-in-java and http://bugs.sun.com/view_bug.do?bug_id=6695369) – Bart Kiers Oct 06 '11 at 19:48
  • So am I right in thinking then that there isn't a regex that will get me just the bracket -- e.g., something I can use in `String.replace()`? – David Moles Oct 06 '11 at 19:56
  • No, you're not right :). See my edit. (note that `replace(...)` doesn't use regex, its `replaceAll(...)` or `replaceFirst(...)`) – Bart Kiers Oct 06 '11 at 20:07
  • Sorry, I meant to say `replaceAll()`. – David Moles Oct 06 '11 at 20:16
0
 "^[^\[]*\[[^\]]*?(\])"

is the group(1) what you want?

Kent
  • 189,393
  • 32
  • 233
  • 301