3

I have a string like

{! texthere }

I want to capture either everything after {! until either the end or you reach the first }. So if I had

{!text here} {!text here again} {!more text here. Oh boy!

I would want ["{!text here}", "{!text here again}", "{!more text here. oh boy!"]

I thought this would work

{!.*}??

but the above string would come out to be ["{!text here} {!text here again} {!more text here. Oh boy!"]

I'm still very inexperienced with regexes so I don't understand why this doesn't work. I would think it would match '{!' followed by any number of characters until you get to a bracket (non greedy) which may not be there.

user1652427
  • 697
  • 2
  • 8
  • 21
  • The easiest way to adapt this is to use non-greedy Kleene star, usually represented as `.*?`. Are you using a regular expression language that has that available? Not all do. – Celada Feb 15 '13 at 14:53

4 Answers4

5

Using positive lookbehind (?<={!)[^}]+:

In [8]: import re

In [9]: str="{!text here} {!text here again} {!more text here. Oh boy!"

In [10]: re.findall('(?<={!)[^}]+',str)
Out[10]: ['text here', 'text here again', 'more text here. Oh boy!']

That is positive lookbehind where by any non } character is matched if following {!.

Chris Seymour
  • 83,387
  • 30
  • 160
  • 202
  • 1
    Ahh amazing! I tried doing a negative lookahead at some point, but I guess I need to read over lookarounds and actually understand them. This is what I needed. I'll mark it as an answer as soon as I'm allowed to! – user1652427 Feb 15 '13 at 14:57
2

I believe you want to use a reluctant quantifier:

{!.*?}?

This will cause the . to stop matching as soon as the first following } is found, instead of the last.

I had a question about greedy and reluctant quantifiers that has a good answer here.

Another option would be to specify the characters that are allowed to come between the two curly braces like so:

{![^}]*}?

This specifies that there cannot be a closing curly brace matched within your pattern.

Community
  • 1
  • 1
jjnguy
  • 136,852
  • 53
  • 295
  • 323
  • @Nolonar, not sure actually. I just copied the regex they were using and added the reluctant modifier to the `*`. – jjnguy Feb 15 '13 at 14:55
  • On the example string in my question, I'm getting just 3 matches of {! – user1652427 Feb 15 '13 at 14:55
  • I believe `??` is a Perl construct which means "Match 0 or 1 time, not greedily". – Jonah Bishop Feb 15 '13 at 14:56
  • `??` has a special meaning and I can't remember what it is – Lightness Races in Orbit Feb 15 '13 at 14:56
  • According to http://msdn.microsoft.com/en-us/library/az24scfc.aspx the `??` at the end "Matches the previous element zero or one time, but as few times as possible", though I don't understand why he'd want the match to contain `{!` yet not `}` – Nolonar Feb 15 '13 at 14:57
  • @jjnguy: By using `??` at the end, you will end up only matching `{!` for the first regex. The second regex works due to the greedy quantifier of `[^}]*` – nhahtdh Feb 15 '13 at 15:00
2

You can do it this way :

({![^}]+}?)

Regular expression image

Edit live on Debuggex

Then recover the capture group $1 which corresponds to the first set of parenthesis.

Using this way, you have to use a "match all" type of function because the regex itself is made to match a single group function

This way doesn't use any look around. Also the use of ^} should limit the number of regex engine cycle since it is searching for the next } as a breaker instead of having to do the whole expression then backtrack.

Community
  • 1
  • 1
Hugo Dozois
  • 8,147
  • 12
  • 54
  • 58
1

if your tool/language supports perl regex, try this:

(?<={!)[^}]*
Kent
  • 189,393
  • 32
  • 233
  • 301