3

I need to replace some things in a string using an array, they can look like this:

array = [3, "$x" , "$y", "$hi_buddy"]
#the first number is number of things in array
string = "$xena is here $x and $y."

I've got another array with things to replace those things, let's say its called rep_array.

rep_array = [3, "A", "B", "C"]

For the replacement I use this:

for x in range (1, array[0] + 1):
  string = string.replace(array[x], rep_array[x])

But the result is:

string = "Aena is here A and B."

But I need to much only lonely $x not $x in another word. Result should look like this:

string = "$xena is here A and B."

Note that:

  • all patterns in array start with $.
  • a pattern matches if it matches the whole word after $; $xena doesn't match $x, but foo$x would match.
  • $ can be escaped with @ and than it should not be matched (for example $x does not match @$x)

4 Answers4

5

this is not a direct answer to your question, but as I guess you'll get other solutions hacking around \b, I'm going to suggest you a more pythonic solution:

rep_dict = {'x': 'A', 'y': 'B', 'hi_buddy': 'C'}
string = '{xena} is here {x} and {y}'

print string.format(rep_dict)

but here, it will raise a KeyError for missing xena in rep_dict, which can be solved by answers to that question, using a defaultdict or a formatter you may prefer depending on your use case.

The problem with using $, is that it is not trivial to make something that matches that does not define the real boundary. Most languages using $ variables apply it to the next one character, using a boundary on larger characters (those are shells and makefiles), i.e. ${xena}. Languages like Perl use a grammar to define the context of a $ variable, and I guess they may use regexps as well in the tokenizer.

That's why in python, we only use formatting operators to mark the boundaries of the variable {} in the string, not having useless $ so we do not have to deal with ambiguities ($xena => ${x}ena or ${xena}?).

HTH

Community
  • 1
  • 1
zmo
  • 24,463
  • 4
  • 54
  • 90
  • of course, I'm giving this for the OP to know and consider using that if it can be an option to him, and for future readers that may consider using a `$` variable in strings for a use case that strings formats have been built for. ;-) – zmo Apr 14 '14 at 09:47
  • 1
    This is the correct TOOWTDI (https://wiki.python.org/moin/TOOWTDI), if the OP has any power over the input strings. – Davidmh Apr 14 '14 at 11:33
3

Use a regular expression that wraps your source text with some whitespace look-behind and a \b anchor; make sure to include the start of the string too:

import re

for pattern, replacement in zip(array[1:], rep_array[1:]):
    pattern = r'{}\b'.format(re.escape(pattern))
    string = re.sub(pattern, replacement, string)

This uses re.escape() to ensure any regular expression meta characters in the pattern are escaped first. zip() is used to pair up your patterns and replacement values; a more pythonic alternative to your range() loop.

\b only matches at a position where a word character is followed by a non-word character (or vice versa), a word boundary. Your patterns all end in a word character, so this makes sure your patterns only match if the next character is not a word character, blocking $x from matching inside $xena.

Demo:

>>> import re
>>> array = [3, "$x" , "$y", "$hi_buddy"]
>>> rep_array = [3, "A", "B", "C"]
>>> string = "$xena is here $x and $y. foo$x matches too!"
>>> for pattern, replacement in zip(array[1:], rep_array[1:]):
...     pattern = r'{}\b'.format(re.escape(pattern))
...     string = re.sub(pattern, replacement, string)
... 
>>> print string
$xena is here A and B. fooA matches too!
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Your solution is almost working the way I need, I also need to be able to escape the $ with @, can u tell me what's wrong with this patter? pattern = r'(?:([^@])|^){}\b'.format(re.escape(pattern)) –  Apr 14 '14 at 11:25
  • 1
    @Whitedracke: You need a look-behind: `r'(?:(?<=[^@])|^){}\b'` – Martijn Pieters Apr 14 '14 at 11:27
  • 1
    @Whitedracke: or better still, a negative look-behind: `r'(?<!@){}\b'` – Martijn Pieters Apr 14 '14 at 11:32
0

string.replace does not know about regular expressions, so you have to use the re module (https://docs.python.org/3.4/library/re.html), namely the re.sub method:

>>>re.sub(r"\$x\b", "replace", r"$xenia $x")
'$xenia replace'
Jasper
  • 3,939
  • 1
  • 18
  • 35
0

You can also try something like this:

import re

search = ["$x" , "$y", "$hi_buddy"]
replace = ["A", "B", "C"]
string = "$xena is here $x and $y skip$x."

repl = dict(zip(search, replace))
print re.sub(r'\B\$\w+', lambda m: repl.get(m.group(0), m.group(0)), string)

# result: $xena is here A and B skip$x.

\B here means "match $ when it's preceded by a non-word char". If you need skip$x to be replaced as well, just drop the \B:

print re.sub(r'\$\w+', lambda m: repl.get(m.group(0), m.group(0)), string)
# $xena is here A and B skipA.
gog
  • 10,367
  • 2
  • 24
  • 38