4

I am trying to make a simple script replacing all occurrences of certain group or set of characters (or set of strings) in text.

In this case I will try to replace all letters "a,e,i,o,u" with certain string.

My script:

def replace_all(text, repl):
    text1 = text.replace("a", repl)
    text2 = text1.replace("e", repl)
    text3 = text2.replace("i", repl)
    text4 = text3.replace("o", repl)
    text5 = text4.replace("u", repl)
    return text5

Is there any simpler way of doing it? What if I need to replace bigger group of chars or strings? Chaining it like this does not seem to be really effective then.

This is maybe a primitive question. However, I am still in learning phase so maybe I get it in later lessons. Thank you in advance for any advice.

Nicolas Gervais
  • 33,817
  • 13
  • 115
  • 143
Blacho
  • 87
  • 1
  • 1
  • 8

3 Answers3

9

My knowledge tells me there are 3 different ways of doing this, all of which are shorter than your method:

  • Using a for-loop
  • Using a generator-comprehension
  • Using regular expressions

First, using a for-loop. This is probably the most straight-forward improvement to your code and essentially just reduces the 5 lines with .replace on down to 2:

def replace_all(text, repl):
    for c in "aeiou":
        text = text.replace(c, repl)
    return text

You could also do it in one-line using a generator-comprehension, combined with the str.join method. This would be faster (if that is of importance) as it is of complexity O(n) since we will go through each character and evaluate it once (the first method is complexity O(n^5) as Python will loop through text five times for the different replaces).

So, this method is simply:

def replace_all(text, repl):
    return ''.join(repl if c in 'aeiou' else c for c in text)

Finally, we can use re.sub to substitute all of the characters in the set: [aeiou] with the text repl. This is the shortest of the solutions and probably what I would recommend:

import re
def replace_all(text, repl):
    return re.sub('[aeiou]', repl, text)

As I said at the start, all these methods complete the task so there is no point me providing individual test cases but they do work as seen in this test:

>>> replace_all('hello world', 'x')
'hxllx wxrld'

Update

A new method has been brought to my attention: str.translate.

>>> {c:'x' for c in 'aeiou'}
{'a': 'x', 'e': 'x', 'i': 'x', 'o': 'x', 'u': 'x'}
>>> 'hello world'.translate({ord(c):'x' for c in 'aeiou'})
'hxllx wxrld'

This method is also O(n), so just as efficient as the previous two.

Joe Iddon
  • 20,101
  • 7
  • 33
  • 54
  • What about str.translate()? and why this is a good or a bad idea? – Outcast May 30 '19 at 12:13
  • @PoeteMaudit Never encountered that method before! See my update. I don't know what you mean by "good" or "bad" - you can do whatever you want so long as it is as efficient as is required, which I would expect this method to be. – Joe Iddon May 30 '19 at 17:53
  • I am surprised that you have not encounter before. See this too: https://stackoverflow.com/questions/56378872/replace-multiple-special-characters-most-efficient-way. – Outcast May 30 '19 at 17:56
1

This is a fine place for a regular expression:

import re

def replace_all(text, repl):
    return re.sub('[aeiou]', repl, text)

This will work for the case in your question, where you're replacing single characters. If you want to replace a set of longer strings:

def replace_all(text, to_replace, replacement):
    pattern = '|'.join(to_replace)
    return re.sub(pattern, replacement, text)

>>> replace_all('this is a thing', ['thi','a'], 'x')
'xs is x xng'
Nathan Vērzemnieks
  • 5,495
  • 1
  • 11
  • 23
0

So what you're doing is perfectly valid, however there are better ways.

Here are some solutions, with runtimes taken over 100000 loops.

The main signature:

Targets are the characters you want to replace, repl is the replacement character.

def replace_all(text, targets=['a', 'e', 'i', 'o', 'u'], repl='_'):
    text = # Something here to swap characters as an array
    return ''.join(text) # Stitch it back together

Bytearray

Bytearray Is a mutable data structure that contains a list of the characters themselves. As a data structure, it is seemingly the ideal choice, strings in python are immutable, this works around that to prevent constant construction/destruction.

[chr(c) if chr(c) not in targets else repl for c in bytearray(text, 'utf-8')]

Runs in 0.365

Without bytearray

This operates on a simple list, the list itself is mutable, but the characters are strings, therefore there is some modification of technically immutable structures here.

[c if c not in targets else repl for c in text]

runs in 0.179

Map

This maps the function onto each character in the string.

map(lambda c: c if c not in targets else repl, text) 

Runs in 0.265