2

I'm A bit stuck with a regular expression. I have a string in the format

{% 'ello %} wor'ld {% te'st %}

and I want to escape only apostrophes that aren't between {% ... %} tags, so the expected output is

{% 'ello %} wor"ld {% te'st %}

I know I can replace all of them just using the string replace function, but I'm at a loss as to how to use regexs to just match those outside braces

Matt Ball
  • 354,903
  • 100
  • 647
  • 710
bcoughlan
  • 25,987
  • 18
  • 90
  • 141

4 Answers4

5

This can probably be done with regex, but it would be a complicated one. It's easier to write and read if you just do it directly:

def escape(s):
    isIn = False
    ret = []
    for i in range(len(s)):
        if not isIn and s[i]=="'": ret += ["""]
        else: ret += s[i:i+1]

        if isIn and s[i:i+2]=="%}": isIn = False
        if not isIn and s[i:i+2]=="{%": isIn = True

    return "".join(ret)
Petar Ivanov
  • 91,536
  • 11
  • 82
  • 95
  • +1: regexs are the wrong tool here. You need to fix your function though. Apostrophes *not* in the tags should be escaped, so ``if isIn and s[i]=="'"...`` should be ``if not isIn...``. – Blair Nov 06 '11 at 22:26
3

Just for fun, this is the way to do it with regex:

>>> instr = "{% 'ello %} wor&quote;ld {% te'st %}"
>>> re.sub(r'\'(?=(.(?!%}))*({%|$))', r'&quote;', instr)
"{% 'ello %} wor&quote;ld {% te'st %}"

It uses a positive look ahead to find either {% or the end of the string, and a negative lookahead inside that positive lookahead to make sure it is not including any %} in the looking forward.

Ehsan Foroughi
  • 3,010
  • 2
  • 18
  • 20
2

If you want to use regular expression, you could do it like this though:

>>> s = """'{% 'ello %} wor'ld {% te'st %}'"""
>>> segments = re.split( '(\{%.*?%\})', s )
>>> for i in range( 0, len( segments ), 2 ):
    segments[i] = segments[i].replace( '\'', '"' )

>>> ''.join( segments )
""{% 'ello %} wor"ld {% te'st %}""

Comparing with Ehsan’s look-ahead solution, this has the benefit that you can run any kind of replacements or analysis on the segments without having to re-run another regular expression. So if you decide to replace another character, you can easily do that in the loop.

poke
  • 369,085
  • 72
  • 557
  • 602
0

bcloughlan, resurrecting this question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a general question about how to exclude patterns in regex.)

Here's a simple regex:

{%.*?%}|(\')

The left side of the alternation matches complete {% ... %} tags. We will ignore these matches. The right side matches and captures apostrophes to Group 1, and we know they are the right apostrophes because they were not matched by the expression on the left.

This program shows how to use the regex (see the results in the online demo):

import re
subject = "{% 'ello %} wor'ld {% te'st %}"
regex = re.compile(r'{%.*?%}|(\')')
def myreplacement(m):
    if m.group(1):
        return """
    else:
        return m.group(0)
replaced = regex.sub(myreplacement, subject)
print(replaced)

Reference

  1. How to match pattern except in situations s1, s2, s3
  2. How to match a pattern unless...
Community
  • 1
  • 1
zx81
  • 41,100
  • 9
  • 89
  • 105