3

I 'd like to create a regex that matches unmatched right square brackets. Examples:

]ichael ==> match ]

[my name is Michael] ==> no match

No nested pairs of of square brackets occur in my text.

I tried to use negative lookbehind for that, more specifically I use this regex: (?<!\[(.)+)\] but it doesn't seem to do the trick.

Any suggestions?

Yannis P.
  • 2,745
  • 1
  • 24
  • 39
  • 1
    which regex flavor are you using? – Martin Ender Jul 01 '13 at 13:15
  • I am trying RegExr to test things a bit but I don't know which engine is it using. I ll apply it with either Java or Python – Yannis P. Jul 01 '13 at 13:31
  • 1
    That uses ECMAScript flavor as implemented by ActionScript. Better use a tester that uses the flavor you will use eventually like http://www.regexplanet.com/ – Martin Ender Jul 01 '13 at 13:32
  • Your insistence that the regex consume the errant bracket and nothing else is making the job much more difficult than it needs to be. Why do you have to do it that way? If you can explain that, we might be able to devise a better approach. Help us help you! – Alan Moore Jul 01 '13 at 15:42
  • @AlanMoore Hey Alan, here is the thing: I am having a text that I want to cleanse. In the text whenever a word starts with an unmatched ]. eg. `]ichael` I know that a special character should appear in this place. However, the text contains also parts where square brackets are used with their normal use as in `[my name is Michael]`. I am able to circumvent this but I thought to play a bit with regexes as well just for the sake of it – Yannis P. Jul 01 '13 at 16:19
  • Yes, and the easiest way to do that is to capture everything that precedes the bad bracket and plug it into the result with a group reference (e.g. `$1`). One of my favorite rules of thumb is, if you're not sure how to write the regex you need, lookbehind should be *last* tool you reach for, not the first. There's almost always a better but less obvious way. – Alan Moore Jul 01 '13 at 16:41
  • @AlanMoore actually, since he is just looking for a single character (and knows which one it will be), it would be even easier to just do the matching without any capturing, figure out where the match ended, and then replace that particular character in the string with whatever he wants. – Martin Ender Jul 01 '13 at 17:10
  • It could be even simpler than that. If the regex flavor is Perl or PHP he can use `\K`, which has almost exactly the effect you described, but entirely within the regex. – Alan Moore Jul 01 '13 at 19:23
  • @YannisP, this one of the reasons why you should avoid lookbehinds. Most of the Perl-like regex flavors support the same set of core features, which behave almost exactly the same in all of them. But the behavior of lookbehinds can be wildly different from one flavor to the next. – Alan Moore Jul 01 '13 at 19:25
  • Thank you all for the constructive comments. Unfortunately I am neither confident with Perl or regexes but I am on my way with the latter. – Yannis P. Jul 02 '13 at 09:06

4 Answers4

3

Unless you are using .NET, lookbehinds have to be of fixed length. Since you just want to detect whether there are any unmatched closing brackets, you don't actually need a lookbehind though:

^[^\[\]]*(?:\[[^\[\]]*\][^\[\]]*)*\]

If this matches you have an unmatched closing parenthesis.

It's a bit easier to understand, if you realise that [^\[\]] is a negated character class that matches anything but square brackets, and if you lay it out in freespacing mode:

^              # start from the beginning of the string
[^\[\]]*       # match non-bracket characters
(?:            # this group matches matched brackets and what follows them
  \[           # match [
  [^\[\]]*     # match non-bracket characters
  \]           # match ]
  [^\[\]]*     # match non-bracket characters
)*             # repeat 0 or more times
\]             # match ]

So this tries to find a ] after matching 0 or more well-matched pairs of brackets.

Note that the part between ^ and ] is functionally equivalent to Tim Pietzker's solution (which is a bit easier to understand conceptually, I think). What I have done, is an optimization technique called "unrolling the loop". If your flavor provides possessive quantifiers, you can turn all * into *+ to increase efficiency even further.


About your attempt

Even if you are using .NET, the problem with your pattern is that . allows you to go past other brackets. Hence, you'd get no match in

[abc]def]

Because both the first and the second ] have a [ somewhere in front of them. If you are using .NET, the simplest solution is

(?<!\[[^\[\]]*)\]

Here we use non-bracket characters in the repetition, so that we don't look past the first [ or ] we encounter to the left.

Community
  • 1
  • 1
Martin Ender
  • 43,427
  • 11
  • 90
  • 130
  • 1
    Be aware that if you end up using Java you'll have to escape *all* of the literal brackets: `[^\[\]]*\]`. Then you'll have to escape the escapes when you write it as a Java string literal: `"[^\\[\\]]*\\]"`. – Alan Moore Jul 01 '13 at 16:17
  • @AlanMoore thanks. I wasn't aware that Java doesn't allow unambiguously unescaped brackets. – Martin Ender Jul 01 '13 at 16:21
  • It does automatically escape a closing bracket if it's the first character listed (e.g. `[]]`, `[^]]`). I usually escape it anyway; the readability hit of the extra characters is more than offset by the increased visual symmetry. It makes it easier to port the regex to other flavors, too. – Alan Moore Jul 01 '13 at 16:59
  • @AlanMoore fair enough, good point. Let's conclude, matching square brackets with regex is the worst. – Martin Ender Jul 01 '13 at 17:07
  • When you say lookbehinds must have a fixed length, does that mean I cannot use the "at least _m_, and at most _n_" notation like: `(?<=\w{1,4})`? – Lucas Feb 19 '15 at 23:14
  • 1
    @LucasMorgan That depends on the flavour your using. In most of them, it won't work, but I believe there are a couple of flavours which allow variable-length as long as there is a finite number of possible lengths. I don't recall right now which ones allow that, though. – Martin Ender Feb 19 '15 at 23:48
2

You don't need lookaround at all (and it would be difficult to use it most languages don't allow unlimited-length lookbehind assertions):

((?:\[[^\[\]]*]|[^\[\]]*)*+)\]

will match any text that ends in a closing bracket unless there's a corresponding opening bracket before it. It does not (and according to your question doesn't need to) handle nested brackets.

The part before the ] can be found in $1 so you can reuse it later.

Explanation:

(           # Match and capture in group number 1:
 (?:        # the following regex (start of non-capturing group):
  \[        # Either a [
  [^\[\]]*  # followed by non-brackets
  \]        # followed by ]
 |          # or
  [^\[\]]*  # Any number of non-bracket characters
 )*+        # repeat as needed, match possessively to avoid backtracking
)           # End of capturing group
\]          # Match ]
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • @m.buettner: Yes, I just noticed that, too :) – Tim Pietzcker Jul 01 '13 at 13:18
  • Hi Tim. Well perhaps [RegExr](http://gskinner.com/RegExr/) is not the best place to test but your regex is capturing the the text that is included between square brackets as well. And how about if I only want to capture the right bracket and not the text? – Yannis P. Jul 01 '13 at 13:28
  • @YannisP.: It has to match it, otherwise it wouldn't know whether the next `]` is single or not. As for your second question, that depends on your regex engine (m.buettner asked for that info a while ago, remember?). – Tim Pietzcker Jul 01 '13 at 13:31
  • @YannisP. so you want to match just the `]` instead of the whole text. May I ask why? Although it *is* possible with non-.NET regex, it would probably complicate things a lot, and I suspect there might be better solutions to your ultimate goal. – Martin Ender Jul 01 '13 at 13:52
0

This should do it:

'^[^\[]*\]'

Basically says pick out any closing square bracket that doesn't have an open square bracket between it and the beginning of the line.

Dave Sexton
  • 10,768
  • 3
  • 42
  • 56
  • Thanks Dave. How could I just match the right ']' on the string? – Yannis P. Jul 01 '13 at 13:33
  • Not sure what is you are trying to do but you could use this '(?<=^[^\[]*)\]' which uses a look behind. But what is the point of matching a square bracket when you know it's a square bracket? – Dave Sexton Jul 01 '13 at 13:56
  • @DaveSexton then you have a negative lookbehind of variable length again. – Martin Ender Jul 01 '13 at 14:03
  • That regex works correctly whenever the closing bracket is the first square bracket it sees, but not if there's a balanced pair of brackets ahead of it, as in `[I've got a nickel]]`. – Alan Moore Jul 01 '13 at 15:58
-1
\](.*)

Will match on everything after the ]:

]ichael -> ichael
[my name is Michael] ->
Sebastián Palma
  • 32,692
  • 6
  • 40
  • 59
Lars
  • 63
  • 1
  • 6