1

I have the following parsing scenario in python, there is cases of lines:

  1. {{ name xxxxxxCONTENTxxxxx /}}
  2. {{ name }} xxxxxxxCONTENTxxxxxxx {{ name /}}
  3. {{ name xxxxxxCONTENTxxx {comand} xxxxCONTENTxxx /}}

All I need to do is classify to which case the given line belongs using regex.

I can successfully classify between 1) and 2) but having trouble to deal with 3).

to catch 1) I use:

re.match('\s*{{[^{]*?/}}\s*',line)

to catch 2) I use:

re.match('{{.*?}}',line)

and then raise a flag to keep the context since case 2) can be over multiple lines. How can I catch case 3) ??

The condition which I'm currently trying to match is to test for:

- start with '{{'
- end with '/}}'
- with no '{{' in between

However I'm having a hard time phrasing this in regex.

Thomas Ayoub
  • 29,063
  • 15
  • 95
  • 142
Gerrie van Wyk
  • 679
  • 8
  • 27
  • 2
    '^{{((?!{{).)*/}}$' - See [Regular expression to match line that doesn't contain a word?](http://stackoverflow.com/questions/406230/regular-expression-to-match-line-that-doesnt-contain-a-word) – Tom Rees Apr 05 '16 at 08:15
  • This works well in js but having trouble with it in python. In js it catches condition 1 and 2 which is good, but in python it gives no match. – Gerrie van Wyk Apr 05 '16 at 08:33
  • Using http://pythex.org/ (great site btw :) ) I get that the regex matches 1 and 3, but not 2 - because it has '{{' in it. Could you post your code that didn't work? – Tom Rees Apr 05 '16 at 08:45
  • Maybe [`{{(?:(?!{{).)*/}}`](https://regex101.com/r/jB1vM5/3)? (maybe `re.DOTALL` is necessary if it spans across multiple lines) – Wiktor Stribiżew Apr 05 '16 at 08:55
  • Could you please narrow your question to what you exactly need to match and what not to match? Do you want to match `{{ name xxxxxxCONTENTxxxxx /}}` and `{{ name xxxxxxCONTENTxxx {comand} xxxxCONTENTxxx /}}` as entire strings, and not match at all `{{ name }} xxxxxxxCONTENTxxxxxxx {{ name /}}`? – Wiktor Stribiżew Apr 05 '16 at 09:00
  • @WiktorStribiżew I think OP is just trying to classify a string in 3 options – rock321987 Apr 05 '16 at 09:01
  • @rock321987: You know that regexes do not "classify", they either match or not. That is what we need to understand to answer the question. Else, we can only guess. – Wiktor Stribiżew Apr 05 '16 at 09:02
  • @WiktorStribiżew Maybe he is trying to `match` and then `classify` – rock321987 Apr 05 '16 at 09:03
  • @rock321987: See, you are guessing :) – Wiktor Stribiżew Apr 05 '16 at 09:04
  • @WiktorStribiżew Yes I am..:) – rock321987 Apr 05 '16 at 09:04
  • @GerrievanWyk: [`^{{(?:(?!{{|/}}).)*/}}$`](https://regex101.com/r/mG0oI6/2)? Or even `(?s)^{{(?:(?!{{|/}}).)*/}}$`? – Wiktor Stribiżew Apr 05 '16 at 09:09
  • I'm classifying by checking match with if statements. Sorry for the confusion. The following worked for me in python to catch condition 1 and 3 : \s*{{((?!{{).)*?/}}\s* .. Thanks for the help! – Gerrie van Wyk Apr 05 '16 at 10:07

1 Answers1

1

The conditions:

- start with '{{'
- end with '/}}'
- with no '{{' in between

are a perfect fit for a tempered greedy token.

^{{(?:(?!{{|/}}).)*/}}$
   ^^^^^^^^^^^^^^^^

See regex demo.

The (?:(?!{{|/}}).)* matches any text that is not {{ and /}} (thus matches up to the first /}}). Anchors (^ and $) allow to only match a whole string that starts with {{ and ends with /}} and has no {{ inside. Note that with re.match, you do not neet ^ anchor.

Now, to only match the 3rd type of strings, you need to specify that your pattern should have {....}:

^{{(?:(?!{{|/}}).)*{[^{}]*}(?:(?!{{|/}}).)*/}}$
   | ----  1 -----|| - 2 -||--------1-----|

See another regex demo

Part 1 is the tempered greedy token described above and {[^{}]*} matches a single {...} substring making it compulsory inside the input.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563