Python regex - Replace all characters except those between braces

Question

I'm A bit stuck with a regular expression. I have a string in the format

{% 'ello %} wor'ld {% te'st %}

and I want to escape only apostrophes that aren't between {% ... %} tags, so the expected output is

{% 'ello %} wor&quot;ld {% te'st %}

I know I can replace all of them just using the string replace function, but I'm at a loss as to how to use regexs to just match those outside braces

Can your {% thingies %} nest? – tchrist Nov 06 '11 at 22:48 — tchrist, Nov 06 '11 at 22:48

Petar Ivanov · Accepted Answer · 2011-11-06T22:33:35.657

5

This can probably be done with regex, but it would be a complicated one. It's easier to write and read if you just do it directly:

def escape(s):
    isIn = False
    ret = []
    for i in range(len(s)):
        if not isIn and s[i]=="'": ret += ["&quot;"]
        else: ret += s[i:i+1]

        if isIn and s[i:i+2]=="%}": isIn = False
        if not isIn and s[i:i+2]=="{%": isIn = True

    return "".join(ret)

edited Nov 06 '11 at 22:33

answered Nov 06 '11 at 22:19

Petar Ivanov

91,536
11
82
95

+1: regexs are the wrong tool here. You need to fix your function though. Apostrophes *not* in the tags should be escaped, so ``if isIn and s[i]=="'"...`` should be ``if not isIn...``. – Blair Nov 06 '11 at 22:26

Ehsan Foroughi · Answer 2 · 2011-11-06T22:38:33.313

3

Just for fun, this is the way to do it with regex:

>>> instr = "{% 'ello %} wor&quote;ld {% te'st %}"
>>> re.sub(r'\'(?=(.(?!%}))*({%|$))', r'&quote;', instr)
"{% 'ello %} wor&quote;ld {% te'st %}"

It uses a positive look ahead to find either {% or the end of the string, and a negative lookahead inside that positive lookahead to make sure it is not including any %} in the looking forward.

edited Nov 06 '11 at 22:38

answered Nov 06 '11 at 22:32

Ehsan Foroughi

3,010
2
18
20

poke · Answer 3 · 2011-11-06T22:44:53.313

If you want to use regular expression, you could do it like this though:

>>> s = """'{% 'ello %} wor'ld {% te'st %}'"""
>>> segments = re.split( '(\{%.*?%\})', s )
>>> for i in range( 0, len( segments ), 2 ):
    segments[i] = segments[i].replace( '\'', '&quot;' )

>>> ''.join( segments )
"&quot;{% 'ello %} wor&quot;ld {% te'st %}&quot;"

Comparing with Ehsan’s look-ahead solution, this has the benefit that you can run any kind of replacements or analysis on the segments without having to re-run another regular expression. So if you decide to replace another character, you can easily do that in the loop.

score 0 · Answer 4 · edited May 23 '17 at 10:26

bcloughlan, resurrecting this question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a general question about how to exclude patterns in regex.)

Here's a simple regex:

{%.*?%}|(\')

The left side of the alternation matches complete {% ... %} tags. We will ignore these matches. The right side matches and captures apostrophes to Group 1, and we know they are the right apostrophes because they were not matched by the expression on the left.

This program shows how to use the regex (see the results in the online demo):

import re
subject = "{% 'ello %} wor'ld {% te'st %}"
regex = re.compile(r'{%.*?%}|(\')')
def myreplacement(m):
    if m.group(1):
        return "&quot;"
    else:
        return m.group(0)
replaced = regex.sub(myreplacement, subject)
print(replaced)

Reference

Python regex - Replace all characters except those between braces

4 Answers4