Python replace, using patterns in array

Question

I need to replace some things in a string using an array, they can look like this:

array = [3, "$x" , "$y", "$hi_buddy"]
#the first number is number of things in array
string = "$xena is here $x and $y."

I've got another array with things to replace those things, let's say its called rep_array.

rep_array = [3, "A", "B", "C"]

For the replacement I use this:

for x in range (1, array[0] + 1):
  string = string.replace(array[x], rep_array[x])

But the result is:

string = "Aena is here A and B."

But I need to much only lonely $x not $x in another word. Result should look like this:

string = "$xena is here A and B."

Note that:

all patterns in array start with $.
a pattern matches if it matches the whole word after $; $xena doesn't match $x, but foo$x would match.
$ can be escaped with @ and than it should not be matched (for example $x does not match @$x)

why don't you use [string formats](http://ebeab.com/2012/10/10/python-string-format/) from python, instead of reinventing the wheel? — zmo, Apr 14 '14 at 09:28
@MartijnPieters: What about ` \$x ` and in the replacement array you would have ` A `? — npinti, Apr 14 '14 at 09:33
You could use a pattern like `r'\$x\b'` to replace only the `$x` which is not followed by a more characters (i. e. `$x` in `$x and $y` but not in `$xena`). If things like `bla$x foo` also should not be replaced, then you could use sth like `r'((?P<=\W)|^)\$x\b'` for matching only this. — Alfe, Apr 14 '14 at 09:35
@npinti, with your approach you would not replace `$x` at the beginning of the string. — Alfe, Apr 14 '14 at 09:36
@Alfe: This is why I asked if the pattern always starts with `$`. — Martijn Pieters, Apr 14 '14 at 09:36
+ I can't really change the arrays itself, because it's a part of massive program. — , Apr 14 '14 at 09:45

score 5 · Answer 1 · edited May 23 '17 at 12:07

this is not a direct answer to your question, but as I guess you'll get other solutions hacking around \b, I'm going to suggest you a more pythonic solution:

rep_dict = {'x': 'A', 'y': 'B', 'hi_buddy': 'C'}
string = '{xena} is here {x} and {y}'

print string.format(rep_dict)

but here, it will raise a KeyError for missing xena in rep_dict, which can be solved by answers to that question, using a defaultdict or a formatter you may prefer depending on your use case.

The problem with using $, is that it is not trivial to make something that matches that does not define the real boundary. Most languages using $ variables apply it to the next one character, using a boundary on larger characters (those are shells and makefiles), i.e. ${xena}. Languages like Perl use a grammar to define the context of a $ variable, and I guess they may use regexps as well in the tokenizer.

That's why in python, we only use formatting operators to mark the boundaries of the variable {} in the string, not having useless $ so we do not have to deal with ambiguities ($xena => ${x}ena or ${xena}?).

HTH

of course, I'm giving this for the OP to know and consider using that if it can be an option to him, and for future readers that may consider using a `$` variable in strings for a use case that strings formats have been built for. ;-) — zmo, Apr 14 '14 at 09:47
This is the correct TOOWTDI (https://wiki.python.org/moin/TOOWTDI), if the OP has any power over the input strings. — Davidmh, Apr 14 '14 at 11:33

Martijn Pieters · Accepted Answer · 2014-04-14T10:13:36.313

Use a regular expression that wraps your source text with some whitespace look-behind and a \b anchor; make sure to include the start of the string too:

import re

for pattern, replacement in zip(array[1:], rep_array[1:]):
    pattern = r'{}\b'.format(re.escape(pattern))
    string = re.sub(pattern, replacement, string)

This uses re.escape() to ensure any regular expression meta characters in the pattern are escaped first. zip() is used to pair up your patterns and replacement values; a more pythonic alternative to your range() loop.

\b only matches at a position where a word character is followed by a non-word character (or vice versa), a word boundary. Your patterns all end in a word character, so this makes sure your patterns only match if the next character is not a word character, blocking $x from matching inside $xena.

Demo:

>>> import re
>>> array = [3, "$x" , "$y", "$hi_buddy"]
>>> rep_array = [3, "A", "B", "C"]
>>> string = "$xena is here $x and $y. foo$x matches too!"
>>> for pattern, replacement in zip(array[1:], rep_array[1:]):
...     pattern = r'{}\b'.format(re.escape(pattern))
...     string = re.sub(pattern, replacement, string)
... 
>>> print string
$xena is here A and B. fooA matches too!

Your solution is almost working the way I need, I also need to be able to escape the $ with @, can u tell me what's wrong with this patter? pattern = r'(?:([^@])|^){}\b'.format(re.escape(pattern)) — , Apr 14 '14 at 11:25
@Whitedracke: You need a look-behind: `r'(?:(?<=[^@])|^){}\b'` — Martijn Pieters, Apr 14 '14 at 11:27
@Whitedracke: or better still, a negative look-behind: `r'(?<!@){}\b'` — Martijn Pieters, Apr 14 '14 at 11:32

score 0 · Answer 3 · answered Apr 14 '14 at 09:39

0

string.replace does not know about regular expressions, so you have to use the re module (https://docs.python.org/3.4/library/re.html), namely the re.sub method:

>>>re.sub(r"\$x\b", "replace", r"$xenia $x")
'$xenia replace'

answered Apr 14 '14 at 09:39

Jasper

3,939
1
18
35

This'll match `$x` in `foo$x` too. – Martijn Pieters Apr 14 '14 at 09:44
I need foo$x to be replaced, but dunno how the escape '\' get into the arrays. – Apr 14 '14 at 09:52
@Whitedracke: That's an important detail; *do* include that in your question post! – Martijn Pieters Apr 14 '14 at 10:01
1

@Whitedracke: I updated your post to include that detail, as well as the fact that all patterns start with `$`. It's details like that that make a *huge* difference in what is a proper solution and what is not. – Martijn Pieters Apr 14 '14 at 10:09

score 0 · Answer 4 · answered Apr 14 '14 at 09:49

0

You can also try something like this:

import re

search = ["$x" , "$y", "$hi_buddy"]
replace = ["A", "B", "C"]
string = "$xena is here $x and $y skip$x."

repl = dict(zip(search, replace))
print re.sub(r'\B\$\w+', lambda m: repl.get(m.group(0), m.group(0)), string)

# result: $xena is here A and B skip$x.

\B here means "match $ when it's preceded by a non-word char". If you need skip$x to be replaced as well, just drop the \B:

print re.sub(r'\$\w+', lambda m: repl.get(m.group(0), m.group(0)), string)
# $xena is here A and B skipA.

answered Apr 14 '14 at 09:49

gog

10,367
2
24
38

Using `\B` means `$$x` is also matched. – Martijn Pieters Apr 14 '14 at 09:58
@MartijnPieters: right, and so do `!$x`, `...$x` and similar. I didn't understand from the question if this is a desired behavior or not. – gog Apr 14 '14 at 10:04

Python replace, using patterns in array

4 Answers4