Solution using str.translate()
Here's an approach that hasn't been covered yet:
words = ["Bananas :)", "apple :("]
characters_i_dont_want = [":", "a"]
t = str.maketrans(dict.fromkeys(characters_i_dont_want, None))
for s in words:
print(s.translate(t))
Output:
Bnns )
pple (
You can also update the translation table at any point, adding new characters, or even removing some (with .pop()
or del
):
t.update(dict.fromkeys(map(ord, "snp"), None)) # add 's', 'n', and 'p'
t.pop(ord("a")) # drop 'a' from list of characters to remove
for s in words:
print(s.translate(t))
Output:
Baaa )
ale (
Explanation
In general, to make a translation table for strings, you simply need a mapping with keys as ordinals. You can do this either by passing a dictionary of character mappings to str.maketrans()
, e.g. {"a": "X"}
which str.maketrans()
will turn into {97: 'X'}
, or you can skip str.maketrans()
and map the ordinals yourself like I did in the second example with map(ord, "snp")
.
The dict.fromkeys()
takes an iterable and a default value and constructs the dictionary for you so you don't need to bother with having to write the comprehension: {k: None for k in map(ord, "snp")}
which I personally find tedious at times.
More on str.maketrans()
Another way to build a translation table is by using the 3-argument form of str.translate
, where all three arguments must be strings, and where the characters in the first string should map to the characters in the second string (they must be the same length), and the characters in the third string will map to None
. By passing two empty strings as the first two arguments, you can create a translation table purely for removing characters:
t = str.maketrans("", "", "a:")
# Bnns )
# pple (
However this method is particularly nice if you not only want to remove characters, but also translate characters:
t = str.maketrans("a", "A", ":()") # map "a" to "A" and remove ":", "(", and ")"
# BAnAnAs
# Apple
Just note what happens when you have common characters in the first and third strings:
t = str.maketrans("a", "A", "a") # map "a" to "A" and remove "a"
# Bnns :)
# pple :(
Confirming by checking what str.maketrans("a", "A", "a")
returned:
>>> t
{97: None}
To clarify, str.maketrans()
always returns a dictionary of the type Dict[int, Optional[str]]
. With the first two strings, str.maketrans()
creates a dictionary of ordinal mappings to characters, Dict[int, str]
, and with the third string a Dict[int, None]
is made and then unioned with the first dictionary. This means that any characters present both in the first and third strings will be overwritten by the union, because again, it's just a normal dictionary:
>>> d = {1: "a"}
>>> d.update({1: "x", 2: "b"})
>>> d
{1: 'x', 2: 'b'}
The 3-argument form is essentially equivalent to this code (minus all the safety checks the built-in version has, as well as the other translation table creation methods):
def make_translation_table(from_chars, to_chars, delete_chars):
char_to_char = dict(zip(map(ord, from_chars), to_chars))
char_to_none = dict.fromkeys(map(ord, delete_chars), None)
return {**char_to_char, **char_to_none}
A note on performance
Where the str.translate()
technique really shines is the fact that it's just a lookup table at the end of the day. Here's a comparison with the regex solution from @JoeFerndz, using a string of 100,000 random characters, and a list of 32 characters to remove. Spoiler, str.translate()
is ~12x faster than using regex.
In [1]: import re
...: import random
...: import string
In [2]: s = "".join(random.choice(string.printable) for _ in range(100_000))
...: banned = string.punctuation # !"$%&'()*+,-./:;<=>?@[\]#^_`{|}~
In [3]: p = re.compile('|'.join(map(re.escape, banned))) # Joe's regex pattern
...: t = str.maketrans("", "", banned) # my translation table
In [4]: s.translate(t) == p.sub("", s)
Out[4]: True
In [5]: %timeit s.translate(t)
364 µs ± 1.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [6]: %timeit p.sub("", s)
4.61 ms ± 12.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Another solution is to simply iterate over the characters to replace, which performs quite well (better than regex, even):
In [7]: def with_replace(s, banned):
...: for char in banned:
...: s = s.replace(char, "")
...: return s
...:
In [8]: with_replace(s, banned) == p.sub("", s) == s.translate(t)
Out[8]: True
In [9]: %timeit with_replace(s, banned)
2.2 ms ± 15.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Now compare any of those solutions to the worst performer of them all, creating the new string character-by-character by filtering:
In [10]: def char_by_char(s, banned):
...: result = ""
...: for char in s:
...: if char not in banned:
...: result += char
...: return result
...:
In [11]: char_by_char(s, banned) == p.sub("", s) == s.translate(t)
Out[11]: True
In [12]: %timeit char_by_char(s, banned)
8.35 ms ± 207 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)