94

I have to replace text like "north", "south", etc. with "N", "S" etc. in address fields. I thought of making a dictionary to hold the replacements. Suppose we have:

replacements = {'NORTH':'N','SOUTH':'S','EAST':'E','WEST':'W'}
address = "123 north anywhere street"

Can I use the replacements dictionary to do all the replacements, for example by iterating over it? What would the code for this look like?

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
user1947457
  • 997
  • 1
  • 7
  • 5
  • 3
    this is pretty tricky if matches can overlap. See [this question](http://stackoverflow.com/questions/10931150/phps-strtr-for-python) – georg Jan 04 '13 at 11:45
  • A BIG part of the problem is that the string `replace()` method _returns_ a copy of string with occurrences replaced -- it doesn't do it in-place. – martineau Jan 04 '13 at 13:26
  • 3
    You can simply use [str.translate](https://docs.python.org/3/library/stdtypes.html#str.translate). – Neel Patel Sep 29 '19 at 11:54
  • 3
    See https://stackoverflow.com/questions/2400504/easiest-way-to-replace-a-string-using-a-dictionary-of-replacements for the best solution – Ethan Bradford Feb 28 '20 at 01:19
  • Upon review: after removing the code attempt (which had multiple issues to the point where it might as well be pseudocode; and which **doesn't help understand the question** because this is clearly meant as a how-to question rather than a debugging question), this is fairly clearly a duplicate of the question @EthanBradford found. Even after my edit, I think the other question (and its answers) is overall higher quality, so I closed this as a duplicate. – Karl Knechtel Feb 04 '23 at 18:10

13 Answers13

69
    address = "123 north anywhere street"
    
    for word, initial in {"NORTH": "N", "SOUTH": "S"}.items():
        address = address.replace(word.lower(), initial)
    print(address)

nice and concise and readable too.

Rich Tier
  • 9,021
  • 10
  • 48
  • 71
  • This seems to be the standard approach. I was curious how the XML parsers do it, and the same approach is seen in: `import xml.sax.saxutils as su; print(inspect.getsource(su.escape))` which leads us to `print(inspect.getsource(su.__dict_replace))` – C8H10N4O2 Mar 06 '18 at 18:03
  • Considering reversing this and iterating over your string instead if your dictionary is larger. – Akaisteph7 Dec 02 '22 at 21:59
29

you are close, actually:

dictionary = {"NORTH":"N", "SOUTH":"S" } 
for key in dictionary.iterkeys():
    address = address.upper().replace(key, dictionary[key])

Note: for Python 3 users, you should use .keys() instead of .iterkeys():

dictionary = {"NORTH":"N", "SOUTH":"S" } 
for key in dictionary.keys():
    address = address.upper().replace(key, dictionary[key])
CharlesG
  • 329
  • 3
  • 12
Samuele Mattiuzzo
  • 10,760
  • 5
  • 39
  • 63
  • 1
    very simple and efective to replace against a dictionary. for me just it was enough: – Alexandre Andrade May 08 '18 at 03:17
  • Concise and simple to understand. Exactly enough for me. – msarafzadeh Jun 13 '19 at 11:17
  • 12
    How is this correct? `address.upper().replace(...)` doesn't modify anything in place, it just returns a value, and it's not being assigned to anything. – Enrico Borba Nov 03 '19 at 02:05
  • 2
    If you want you can iterate through the dictionary's key and values at the same time using `for key, value in dictionary.items()`. I don't know whether it has advantages in terms of performance, but I think it is more pythonic – gionni Nov 25 '20 at 11:29
  • The downside of the for loop is that it creates replacement ordering problems, e.g. when you have the string `Do you like café? No, I prefer tea.` and you do .replace("café", "tea") and .replace("tea", "café"), you will get `Do you like café? No, I prefer café.`. If the replacement is done in just one pass, "café" changes to "tea", but it does not change back to "café". See, for example, this question: https://stackoverflow.com/a/15221068/13968392 – mouwsy Nov 06 '21 at 21:34
26

One option I don't think anyone has yet suggested is to build a regular expression containing all of the keys and then simply do one replace on the string:

>>> import re
>>> l = {'NORTH':'N','SOUTH':'S','EAST':'E','WEST':'W'}
>>> pattern = '|'.join(sorted(re.escape(k) for k in l))
>>> address = "123 north anywhere street"
>>> re.sub(pattern, lambda m: l.get(m.group(0).upper()), address, flags=re.IGNORECASE)
'123 N anywhere street'
>>> 

This has the advantage that the regular expression can ignore the case of the input string without modifying it.

If you want to operate only on complete words then you can do that too with a simple modification of the pattern:

>>> pattern = r'\b({})\b'.format('|'.join(sorted(re.escape(k) for k in l)))
>>> address2 = "123 north anywhere southstreet"
>>> re.sub(pattern, lambda m: l.get(m.group(0).upper()), address2, flags=re.IGNORECASE)
'123 N anywhere southstreet'
Duncan
  • 92,073
  • 11
  • 122
  • 156
  • I am quite new to the regular expression and was hoping if you can explain what exactly is happening with lambda and group function. I noticed you also did sorted function. I have multiple keys for which the words are to be replaced by their value, in that case will the sorted function affect anything? Is it really necessary so for example there could be some words in the text file which are present on different intervals/lines – trillion Sep 04 '20 at 10:08
11

You are probably looking for iteritems():

d = {'NORTH':'N','SOUTH':'S','EAST':'E','WEST':'W'}
address = "123 north anywhere street"

for k,v in d.iteritems():
    address = address.upper().replace(k, v)

address is now '123 N ANYWHERE STREET'


Well, if you want to preserve case, whitespace and nested words (e.g. Southstreet should not converted to Sstreet), consider using this simple list comprehension:

import re

l = {'NORTH':'N','SOUTH':'S','EAST':'E','WEST':'W'}

address = "North 123 East Anywhere Southstreet    West"

new_address = ''.join(l[p.upper()] if p.upper() in l else p for p in re.split(r'(\W+)', address))

new_address is now

N 123 E Anywhere Southstreet    W
sloth
  • 99,095
  • 21
  • 171
  • 219
  • But this would end up changing the entire case of the address – Abhijit Jan 04 '13 at 11:46
  • Depends on whether the question is *iterate over an dictionary* or *do all the work for me*. – sloth Jan 04 '13 at 11:53
  • @Abhijit Nonetheless, I added and example of how to preserve case, whitespace and nested matches. – sloth Jan 04 '13 at 12:31
  • @Dominic - great suggestion about unintentionally skewing addresses such as Southstreet Rd. In rethinking this, is there a way to ignore the replace if I have an address such as South St.? Is there a RE that would ignore the replace in this case? – user1947457 Jan 07 '13 at 02:02
11

"Translating" a string with a dictionary is a very common requirement. I propose a function that you might want to keep in your toolkit:

def translate(text, conversion_dict, before=None):
    """
    Translate words from a text using a conversion dictionary

    Arguments:
        text: the text to be translated
        conversion_dict: the conversion dictionary
        before: a function to transform the input
        (by default it will to a lowercase)
    """
    # if empty:
    if not text: return text
    # preliminary transformation:
    before = before or str.lower
    t = before(text)
    for key, value in conversion_dict.items():
        t = t.replace(key, value)
    return t

Then you can write:

>>> a = {'hello':'bonjour', 'world':'tout-le-monde'}
>>> translate('hello world', a)
'bonjour tout-le-monde'
fralau
  • 3,279
  • 3
  • 28
  • 41
6

I would suggest to use a regular expression instead of a simple replace. With a replace you have the risk that subparts of words are replaced which is maybe not what you want.

import json
import re

with open('filePath.txt') as f:
   data = f.read()

with open('filePath.json') as f:
   glossar = json.load(f)

for word, initial in glossar.items():
   data = re.sub(r'\b' + word + r'\b', initial, data)

print(data)
Trafalgar
  • 361
  • 1
  • 4
  • 14
5
def replace_values_in_string(text, args_dict):
    for key in args_dict.keys():
        text = text.replace(key, str(args_dict[key]))
    return text
Artem Malikov
  • 215
  • 1
  • 3
  • 9
5

If you're looking for a concise way, you can go for reduce from functools:

from functools import reduce

str_to_replace = "The string for replacement."
replacement_dict = {"The ": "A new ", "for ": "after "}

str_replaced = reduce(lambda x, y: x.replace(*y), [str_to_replace, *list(replacement_dict.items())])
print(str_replaced)
m7s
  • 103
  • 1
  • 6
3

Try,

import re
l = {'NORTH':'N','SOUTH':'S','EAST':'E','WEST':'W'}

address = "123 north anywhere street"

for k, v in l.iteritems():
    t = re.compile(re.escape(k), re.IGNORECASE)
    address = t.sub(v, address)
print(address)
Adem Öztaş
  • 20,457
  • 4
  • 34
  • 42
2

Both using replace() and format() are not so precise:

data =  '{content} {address}'
for k,v in {"{content}":"some {address}", "{address}":"New York" }.items():
    data = data.replace(k,v)
# results: some New York New York

'{ {content} {address}'.format(**{'content':'str1', 'address':'str2'})
# results: ValueError: unexpected '{' in field name

It is better to translate with re.sub() if you need precise place:

import re
def translate(text, kw, ignore_case=False):
    search_keys = map(lambda x:re.escape(x), kw.keys())
    if ignore_case:
        kw = {k.lower():kw[k] for k in kw}
        regex = re.compile('|'.join(search_keys), re.IGNORECASE)
        res = regex.sub( lambda m:kw[m.group().lower()], text)
    else:
        regex = re.compile('|'.join(search_keys))
        res = regex.sub( lambda m:kw[m.group()], text)

    return res

#'score: 99.5% name:%(name)s' %{'name':'foo'}
res = translate( 'score: 99.5% name:{name}', {'{name}':'foo'})
print(res)

res = translate( 'score: 99.5% name:{NAME}', {'{name}':'foo'}, ignore_case=True)
print(res)
ahuigo
  • 2,929
  • 2
  • 25
  • 45
1

All of these answers are good, but you are missing python string substitution - it's simple and quick, but requires your string to be formatted correctly.

address = "123 %(direction)s anywhere street"
print(address % {"direction": "N"})
cacti5
  • 2,006
  • 2
  • 25
  • 33
  • Not works for `'score: 99.5% name:%(name)s' %{'name':'foo'}`. – ahuigo Oct 02 '18 at 02:31
  • This assumes that we are in control of the string where the replacements are being made; i.e. that we are creating some kind of *template* to fill in with values. However, OP's issue seems to be specifically with *cleaning up* badly formatted data. If we could manually edit the input to have the `%` style placeholders, we might as well edit in the replacements directly. – Karl Knechtel Feb 04 '23 at 18:08
  • Probably want to us f-strings for modern Python (3.7+) – hobs Feb 05 '23 at 22:14
1

A faster way to handle this would be to respect word boundaries and look up each token in your dictionary only once:

token_mapping = {
    'north': 'N', 'south': 'S', 
    'east': 'E', 'west': 'W'
    'street': 'St',
    }

def tokenize(text):
    return text.lower().split()

def detokenize(tokens):
    return ' '.join(tokens)

def replace_tokens(text, token_mapping=token_mapping):
    input_tokens = tokenize(text)
    output_tokens = []
    for tok in input_tokens:
        output_tokens.append(token_mapping.get(tok, tok))
    return detokenize(output_tokens)
>>> replace_tokens("123 north anywhere street")
'123 N anywhere St'

Another advantage of this approach is that you can fold the case of individual tokens to suit your needs:

def detokenize(tokens):
    return ' '.join([t.title() for t in tokens])
>>> replace_tokens("123 north anywhere street")
'123 N Anywhere St'

This is the approach used by web-scale NLP, including spelling-correctors and abbreviation expanders/contractors.

hobs
  • 18,473
  • 10
  • 83
  • 106
0

The advantage of Duncan's approach is that it is careful not to overwrite previous answers. For example if you have {"Shirt": "Tank Top", "Top": "Sweater"}, the other approaches replace "Shirt" with "Tank Sweater".

The following code extends that approach, but sorts the keys such that the longest one is always found first and it uses named groups to search case insensitively.

import re
root_synonyms = {'NORTH':'N','SOUTH':'S','EAST':'E','WEST':'W'}
# put the longest search term first. This menas the system does not replace "top" before "tank top"
synonym_keys = sorted(root_synonyms.keys(),key=len,reverse=True)
# the groups will be named w1, w2, ... . Determine what each of them should become
number_mapping = {f'w{i}':root_synonyms[key] for i,key in enumerate(synonym_keys) }
# make a regex for each word where "tank top" or "tank  top" are the same
search_terms = [re.sub(r'\s+',r'\s+',re.escape(k)) for k in synonym_keys]
# give each search term a name w1 etc where
search_terms = [f'(?P<w{i}>\\b{key}\\b)' for i,key in enumerate(search_terms)]
# make one huge regex
search_terms = '|'.join(search_terms)
# compile it for speed
search_re = re.compile(search_terms,re.IGNORECASE)

query = "123 north anywhere street"
result = re.sub(search_re,lambda x: number_mapping[x.lastgroup],query)
print(result)
Jelmer Wind
  • 342
  • 3
  • 10