How do you remove all characters in a string, except for those in a list?

Question

I have a list of strings, and a list of characters I don't want, how do I remove the characters that are in the list? For example:

l = ["Bananas :)", "apple :("]
characters_i_dont_want = [":", "a"]
for i in l:
    replace_all_characters_except_for_those_in_list(characters_i_dont_want)
    print(i)

output:

Bnns )
pple (

Were you helped by any of the provided answers? Please accept one if so. — ddejohn, Oct 17 '21 at 19:18
Does this answer your question? [Best way to strip punctuation from a string](https://stackoverflow.com/questions/265960/best-way-to-strip-punctuation-from-a-string) — ddejohn, Oct 17 '21 at 19:19
Do you consider any of the provided answers satisfactory? If so, please consider accepting one. That way, your question will be removed from the unanswered queue. — ddejohn, Apr 08 '22 at 22:05

Jan Wilamowski · Answer 1 · 2021-10-07T04:44:30.937

0

You could combine all unwanted characters into a regular expression:

import re

pattern = f'[{"".join(characters_i_dont_want)}]'
for i in l:
    cleaned = re.sub(pattern, '', i)
    print(cleaned)

gives

Bnns )
pple (

Just be careful when you want to remove characters that have special meaning inside character classes: - (dash), [ and ] (brackets), \ (backslash) and ^ (caret), those have to be escaped.

edited Oct 07 '21 at 04:44

answered Oct 07 '21 at 04:33

Jan Wilamowski

3,308
2
10
23

I suggest compiling regex patterns you plan to use more than once. – ddejohn Oct 07 '21 at 06:02

score 0 · Answer 2 · answered Oct 07 '21 at 05:08

There are many ways to solve this. You already have some of the listed in the answers below. Here's one more way to do it.

In this example, I am using the regex OR (|) pipe to join all the substrings into one compiled pattern to replace.

import re

characters_i_dont_want = [":", "a"]
strings = ["Bananas :)", "apple :(", ":( Catamaran )"]

#you can use join to get all map function and join to create the replace string
p = re.compile('|'.join(map(re.escape, characters_i_dont_want))) # escape to handle metachars

#then you can just use sub to replace 
for x in [p.sub('', s) for s in strings]: print (x)

The output of this will be:

Bnns )
pple (
( Ctmrn )

ddejohn · Answer 3 · 2021-10-07T14:25:07.480

Solution using `str.translate()`

Here's an approach that hasn't been covered yet:

words = ["Bananas :)", "apple :("]
characters_i_dont_want = [":", "a"]
t = str.maketrans(dict.fromkeys(characters_i_dont_want, None))

for s in words:
    print(s.translate(t))

Output:

Bnns )
pple (

You can also update the translation table at any point, adding new characters, or even removing some (with .pop() or del):

t.update(dict.fromkeys(map(ord, "snp"), None))  # add 's', 'n', and 'p'
t.pop(ord("a"))  # drop 'a' from list of characters to remove

for s in words:
    print(s.translate(t))

Output:

Baaa )
ale (

Explanation

In general, to make a translation table for strings, you simply need a mapping with keys as ordinals. You can do this either by passing a dictionary of character mappings to str.maketrans(), e.g. {"a": "X"} which str.maketrans() will turn into {97: 'X'}, or you can skip str.maketrans() and map the ordinals yourself like I did in the second example with map(ord, "snp").

The dict.fromkeys() takes an iterable and a default value and constructs the dictionary for you so you don't need to bother with having to write the comprehension: {k: None for k in map(ord, "snp")} which I personally find tedious at times.

More on `str.maketrans()`

Another way to build a translation table is by using the 3-argument form of str.translate, where all three arguments must be strings, and where the characters in the first string should map to the characters in the second string (they must be the same length), and the characters in the third string will map to None. By passing two empty strings as the first two arguments, you can create a translation table purely for removing characters:

t = str.maketrans("", "", "a:")
# Bnns )
# pple (

However this method is particularly nice if you not only want to remove characters, but also translate characters:

t = str.maketrans("a", "A", ":()")  # map "a" to "A" and remove ":", "(", and ")"
# BAnAnAs
# Apple

Just note what happens when you have common characters in the first and third strings:

t = str.maketrans("a", "A", "a")  # map "a" to "A" and remove "a"
# Bnns :)
# pple :(

Confirming by checking what str.maketrans("a", "A", "a") returned:

>>> t
{97: None}

To clarify, str.maketrans() always returns a dictionary of the type Dict[int, Optional[str]]. With the first two strings, str.maketrans() creates a dictionary of ordinal mappings to characters, Dict[int, str], and with the third string a Dict[int, None] is made and then unioned with the first dictionary. This means that any characters present both in the first and third strings will be overwritten by the union, because again, it's just a normal dictionary:

>>> d = {1: "a"}
>>> d.update({1: "x", 2: "b"})
>>> d
{1: 'x', 2: 'b'}

The 3-argument form is essentially equivalent to this code (minus all the safety checks the built-in version has, as well as the other translation table creation methods):

def make_translation_table(from_chars, to_chars, delete_chars):
    char_to_char = dict(zip(map(ord, from_chars), to_chars))
    char_to_none = dict.fromkeys(map(ord, delete_chars), None)
    return {**char_to_char, **char_to_none}

A note on performance

Where the str.translate() technique really shines is the fact that it's just a lookup table at the end of the day. Here's a comparison with the regex solution from @JoeFerndz, using a string of 100,000 random characters, and a list of 32 characters to remove. Spoiler, str.translate() is ~12x faster than using regex.

In [1]: import re
   ...: import random
   ...: import string

In [2]: s = "".join(random.choice(string.printable) for _ in range(100_000))
   ...: banned = string.punctuation  # !"$%&'()*+,-./:;<=>?@[\]#^_`{|}~

In [3]: p = re.compile('|'.join(map(re.escape, banned)))  # Joe's regex pattern
   ...: t = str.maketrans("", "", banned)  # my translation table

In [4]: s.translate(t) == p.sub("", s)
Out[4]: True

In [5]: %timeit s.translate(t)
364 µs ± 1.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [6]: %timeit p.sub("", s)
4.61 ms ± 12.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Another solution is to simply iterate over the characters to replace, which performs quite well (better than regex, even):

In [7]: def with_replace(s, banned):
   ...:     for char in banned:
   ...:         s = s.replace(char, "")
   ...:     return s
   ...:

In [8]: with_replace(s, banned) == p.sub("", s) == s.translate(t)
Out[8]: True

In [9]: %timeit with_replace(s, banned)
2.2 ms ± 15.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Now compare any of those solutions to the worst performer of them all, creating the new string character-by-character by filtering:

In [10]: def char_by_char(s, banned):
    ...:     result = ""
    ...:     for char in s:
    ...:         if char not in banned:
    ...:             result += char
    ...:     return result
    ...:

In [11]: char_by_char(s, banned) == p.sub("", s) == s.translate(t)
Out[11]: True

In [12]: %timeit char_by_char(s, banned)
8.35 ms ± 207 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

George Rahul · Answer 4 · 2021-10-07T15:29:59.190

l = ["Bananas :)", "apple :("]
characters_i_dont_want = [":", "a"]
for i in l:
    for ch in characters_i_dont_want:
           i=i.replace(ch,'')#replacing the character with empty space
    print(i)

I guess that this might be an easier solution
Using regex can help and can be used if you know about it. Else, using a string replace as shown above is more than sufficient.
See more about string replacment here

score -1 · Answer 5 · answered Oct 07 '21 at 05:01

-1

you can do something like this:

l = ["Bananas :)", "apple :("]

for word in l:
    replaced = ''
    for char in word:
        # Checking if character is equal to a or :
        #if that is so do nothing
        if char == ':' or char == 'a':
            replaced += char
        else:
            replaced += char.replace(char, '*')
            
    print(replaced)

In here I am appending the char to the replaced variable and then replacing that appended character with * Note : .replace(toReplace, replaceWith) returns a copy of the replaced string not altering the original string itself.

answered Oct 07 '21 at 05:01

Aashish Pal

35
9

Some feedback: this won't scale well if OP wants to remove lots of characters. It'd be better to use a membership test: `if char in characters_i_dont_want`. PS, strings are already immutable, so you'll always be returning a new string after any string operation :) – ddejohn Oct 07 '21 at 06:04
Also this code doesn't even work. In the branch where you say "do nothing" you actually end up appending those chars to `replaced`. You can also replace with an empty string, which is what OP wants, not `"*"`. – ddejohn Oct 07 '21 at 06:25

How do you remove all characters in a string, except for those in a list?

5 Answers5

Solution using `str.translate()`

Explanation

More on `str.maketrans()`

A note on performance

Linked

How do you remove all characters in a string, except for those in a list?

5 Answers5

Solution using str.translate()

Explanation

More on str.maketrans()

A note on performance

Linked

Solution using `str.translate()`

More on `str.maketrans()`