2

I'm trying to write a regex to replace strings if not surrounded by single quotes. For example I want to replace FOO with XXX in the following string:

string = "' FOO ' abc 123 ' def FOO ghi 345 ' FOO '' FOO ' lmno 678 FOO '"

the desired output is:

output = "' FOO ' abc 123 ' def FOO ghi 345 ' XXX '' XXX ' lmno 678 FOO '"

My current regex is:

myregex = re.compile("(?<!')+( FOO )(?!')+", re.IGNORECASE)

I think I have to use look-around operators, but I don't understand how... regex are too complicated to me :D

Can you help me?

LJNielsenDk
  • 1,414
  • 1
  • 16
  • 32
daveoncode
  • 18,900
  • 15
  • 104
  • 159

2 Answers2

3

Here's how it could be done:

import re

def replace_FOO(m):
    if m.group(1) is None:
        return m.group()

    return m.group().replace("FOO", "XXX")

string = "' FOO ' abc 123 ' def FOO ghi 345 ' FOO '' FOO ' lmno 678 FOO '"

output = re.sub(r"'[^']*'|([^']*)", replace_FOO, string)

print(string)
print(output)

[EDIT]

The re.sub function will accept as a replacement either a string template or a function. If the replacement is a function, every time it finds a match it'll call the function, passing the match object, and then use the returned value (which must be a string) as the replacement string.

As for the pattern itself, as it searches, if there's a ' at the current position it'll match up to and including the next ', otherwise it'll match up to but excluding the next ' or the end of the string.

The replacement function will be called on each match and return the appropriate result.

Actually, now I think about it, I don't need to use a group at all. I could do this instead:

def replace_FOO(m):
    if m.group().startswith("'"):
        return m.group().replace("FOO", "XXX")

    return m.group()

string = "' FOO ' abc 123 ' def FOO ghi 345 ' FOO '' FOO ' lmno 678 FOO '"

output = re.sub(r"'[^']*'|[^']+", replace_FOO, string)
MRAB
  • 20,356
  • 6
  • 40
  • 33
  • Does not work for me, I get `' FOO '' def FOO ghi 345 '''' lmno 678 FOO '` as the output (the "XXX" are gone) – Adam Parkin Aug 03 '12 at 18:20
  • It works as expected for me (Python 2.7.1) thanks a lot! It would be very useful if you could explain the code, since I'm a Python and regex newbie :P – daveoncode Aug 11 '12 at 21:02
2

This is hard to do without variable length lookbehind. I'm not sure if python regex support it. Anyway, a simple solution is the following:

Use this regex: (?:[^'\s]\s*)(FOO)(?:\s*[^'\s])

The first capture group should return the right result.

In case this is always a quote with a single space after it, as in your example, you can use fixed length lookbehind: (?<=[^'\s]\ )FOO(?=\s*[^'\s]) which will match exactly the one you want.

davidrac
  • 10,723
  • 3
  • 39
  • 71
  • 1
    Python's standard regex library 're' doesn't support variable-length lookbehinds, but there is an alternative regex library on PyPI which does at http://pypi.python.org/pypi/regex. – MRAB Aug 03 '12 at 18:20