Applying a dictionary of string replacements to a list of strings

Question

Say I have a list of strings and a dictionary specifying replacements:

E.g.

my_replacements = {'1/2': 'half', '1/4': 'quarter', '3/4': 'three quarters'}

and a list of strings, where each string can possibly include keys from the above dictionary, e.g:

['I own 1/2 bottle', 'Give me 3/4 of the profit']

How can I apply the replacements to the list? What would be a Pythonic way to do this?

roippi · Accepted Answer · 2014-04-28T15:12:34.447

5

O(n) solution:

reps = {'1/2': 'half', '1/4': 'quarter', '3/4': 'three quarters'}
li = ['I own 1/2 bottle', 'Give me 3/4 of the profit']

map(lambda s: ' '.join([reps.get(w,w) for w in s.split()]),li)
Out[6]: ['I own half bottle', 'Give me three quarters of the profit']

#for those who don't like `map`, the list comp version:
[' '.join([reps.get(w,w) for w in sentence.split()]) for sentence in li]
Out[9]: ['I own half bottle', 'Give me three quarters of the profit']

The issue with making lots of replace calls in a loop is that it makes your algorithm O(n**2). Not a big deal when you have a replacement dict of length 3, but when it gets large, suddenly you have a really slow algorithm that doesn't need to be.

As noted in comments, this approach fundamentally depends on being able to tokenize based on spaces - thus, if you have any whitespace in your replacement keys (say, you want to replace a series of words) this approach will not work. However being able to replace only-words is a far more frequent operation than needing to replace groupings-of-words, so I disagree with the commenters who believe that this approach isn't generic enough.

edited Apr 28 '14 at 15:12

answered Apr 28 '14 at 14:44

roippi

25,533
4
48
73

The only problem with this solution is, we cannot keep track of the whitespace characters properly. – thefourtheye Apr 28 '14 at 14:50
talking of performance, how about replacing that list comprehension with a generator expression; also, the OP asked for a Pythonic way... `map` with a `lambda` is not considered Pythonic nowadays. – Erik Kaplun Apr 28 '14 at 14:51
Interesting; didn't know that. I thought `join` could appreciate the laziness of its argument. – Erik Kaplun Apr 28 '14 at 14:55
2

Btw, I won't downvote, but this answer sacrifices 1) readability/simplicity and 2) genericity (it doesn't work on all inputs) in favor of (most probably premature) optimization. – Erik Kaplun Apr 28 '14 at 14:58
1

I've addressed the comments on genericity in the question itself. As for premature optimization, that's not the point here. Understanding algorithmic complexity is a *vital* tool for lots of nascent programmers who come asking questions on SO. Knowing how your algorithm scales with inputs is a Big Deal, and it's a Very Good Idea to get in the habit of having your brain constantly question the big-oh of the code you write. – roippi Apr 28 '14 at 15:26
there's always the readability/simplicity and performance tradeoff tho. – Erik Kaplun Apr 28 '14 at 15:54

vaultah · Answer 2 · 2014-04-28T14:34:44.503

3

a = ['I own 1/2 bottle', 'Give me 3/4 of the profit']
b = {'1/2': 'half', '1/4': 'quarter', '3/4': 'three quarters'}

def replace(x):
    for what, new in b.items(): # or iteritems in Python 2
        x = x.replace(what, new)
    return x

print(list(map(replace, a)))

Output:

['I own half bottle', 'Give me three quarters of the profit']

edited Apr 28 '14 at 14:34

answered Apr 28 '14 at 14:29

vaultah

44,105
12
114
143

Pretty nice, but you should probably do `for old, new in b.iteritems():` and `replace(old, new)` to avoid a gratuitous dict lookup. – John Zwinck Apr 28 '14 at 14:30

Erik Kaplun · Answer 3 · 2014-04-28T14:37:20.557

I'd use something like this:

def replace_all(replacements, s):
    for old, new in replacements.items():
        s = s.replace(old, new)
    return s

my_replacements = {'1/2': 'half', '1/4': 'quarter', '3/4': 'three quarters'}
strings = ['I own 1/2 bottle', 'Give me 3/4 of the profit']

print ", ".join(replace_all(my_replacements, x) for x in strings)

Output:

I own half bottle, Give me three quarters of the profit

score 2 · Answer 4 · answered Apr 28 '14 at 14:52

Use re.sub.

import re

my_replacements = {'1/2': 'half', '1/4': 'quarter', '3/4': 'three quarters'}
strings = ['I own 1/2 bottle', 'Give me 3/4 of the profit']

print [re.sub(r'\d/\d', lambda x: my_replacements[x.group()], string) for string in strings]

output:

['I own half bottle', 'Give me three quarters of the profit']

score 2 · Answer 5 · answered May 01 '14 at 16:08

If you expect the strings in the list to have many matches and are doing the replacements for my_replacements for a large size list or on many lists, then it might make sense to construct a pattern and use re.sub. The following solution, unlike user2931409 doesn't require any special structure to the replacements, and it should perform at least as well as roippi's solution, because it doesn't make multiple passes over the input strings either:

import re

my_replacements = {'1/2': 'half', '1/4': 'quarter', '3/4': 'three quarters'}

l = ['I own 1/2 bottle', 'Give me 3/4 of the profit']

def do_replacement(match):
    return my_replacements[match.group(0)]

r = re.compile('|'.join('(?:%s)' % (re.escape(k)) for k in my_replacements.keys()))

[r.sub(s, do_replacement) for s in l]

score 0 · Answer 6 · answered Apr 28 '14 at 14:38

I have used Dictionary-Based Formatting Expressions.

Docs: https://docs.python.org/2/library/string.html#format-examples

my_replacements = {'1/2': 'half', '1/4': 'quarter', '3/4': 'three quarters'}
c = 'I own %(1/2)s bottle, Give me %(3/4)s of the profit' % my_replacements
print(c)
# I own half bottle, Give me three quarters of the profit

Applying a dictionary of string replacements to a list of strings

6 Answers6