Removing unwanted characters from a string in Python

Question

I have some strings that I want to delete some unwanted characters from them. For example: Adam'sApple ----> AdamsApple.(case insensitive) Can someone help me, I need the fastest way to do it, cause I have a couple of millions of records that have to be polished. Thanks

Could you be more specific? Which exact characters do you want removed? — Syntactic, May 06 '10 at 12:06

score 6 · Answer 1 · answered Sep 18 '14 at 22:41

Here is a function that removes all the irritating ascii characters, the only exception is "&" which is replaced with "and". I use it to police a filesystem and ensure that all of the files adhere to the file naming scheme I insist everyone uses.

def cleanString(incomingString):
    newstring = incomingString
    newstring = newstring.replace("!","")
    newstring = newstring.replace("@","")
    newstring = newstring.replace("#","")
    newstring = newstring.replace("$","")
    newstring = newstring.replace("%","")
    newstring = newstring.replace("^","")
    newstring = newstring.replace("&","and")
    newstring = newstring.replace("*","")
    newstring = newstring.replace("(","")
    newstring = newstring.replace(")","")
    newstring = newstring.replace("+","")
    newstring = newstring.replace("=","")
    newstring = newstring.replace("?","")
    newstring = newstring.replace("\'","")
    newstring = newstring.replace("\"","")
    newstring = newstring.replace("{","")
    newstring = newstring.replace("}","")
    newstring = newstring.replace("[","")
    newstring = newstring.replace("]","")
    newstring = newstring.replace("<","")
    newstring = newstring.replace(">","")
    newstring = newstring.replace("~","")
    newstring = newstring.replace("`","")
    newstring = newstring.replace(":","")
    newstring = newstring.replace(";","")
    newstring = newstring.replace("|","")
    newstring = newstring.replace("\\","")
    newstring = newstring.replace("/","")        
    return newstring

It was before I got into regular expressions, basically the code equivalent of my embarassing goth phase. Although, it does allow the untrained to make modifications, which is pretty much a necessity at my work. — Pescolly, Sep 21 '16 at 19:24

score 6 · Accepted Answer · answered May 06 '10 at 12:09

6

One simple way:

>>> s = "Adam'sApple"
>>> x = s.replace("'", "")
>>> print x
'AdamsApple'

... or take a look at regex substitutions.

answered May 06 '10 at 12:09

miku

181,842
47
306
310

Mark Tolonen · Answer 3 · 2010-05-06T14:31:23.017

5

Any characters in the 2nd argument of the translate method are deleted:

>>> "Adam's Apple!".translate(None,"'!")
'Adams Apple'

NOTE: translate requires Python 2.6 or later to use None for the first argument, which otherwise must be a translation string of length 256. string.maketrans('','') can be used in place of None for pre-2.6 versions.

edited May 06 '10 at 14:31

answered May 06 '10 at 14:20

Mark Tolonen

166,664
26
169
251

I might be helpful to explicitly mention `string.maketrans('', '')` as a substitute for `None` for Python < 2.6 – jfs May 06 '10 at 14:26
Six times faster than `"".join(char for char in text if char not in bad_chars)` :) – badp May 06 '10 at 16:22

score 2 · Answer 4 · answered May 06 '10 at 12:09

2

Try:

"Adam'sApple".replace("'", '')

One step further, to replace multiple characters with nothing:

import re
print re.sub(r'''['"x]''', '', '''a'"xb''')

Yields:

ab

answered May 06 '10 at 12:09

dlamotte

6,145
4
31
40

score 1 · Answer 5 · answered May 06 '10 at 12:10

1

str.replace("'","");

answered May 06 '10 at 12:10

Delan Azabani

79,602
28
170
210

score 1 · Answer 6 · answered May 06 '10 at 16:09

As has been pointed out several times now, you have to either use replace or regular expressions (most likely you don't need regexes though), but if you also have to make sure that the resulting string is plain ASCII (doesn't contain funky characters like é, ò, µ, æ or φ), you could finally do

>>> u'(like é, ò, µ, æ or φ)'.encode('ascii', 'ignore')
'(like , , ,  or )'

score 0 · Answer 7 · answered Oct 04 '16 at 13:58

An alternative that will take in a string and an array of unwanted chars

    # function that removes unwanted signs from str
    #Pass the string to the function and an array ofunwanted chars

def removeSigns(str,arrayOfChars):

    charFound = False

    newstr = ""

    for letter in str:
        for char in arrayOfChars:
            if letter == char:
                charFound = True
                break
        if charFound == False:
            newstr += letter
        charFound = False

    return newstr

score 0 · Answer 8 · answered Jun 17 '18 at 17:34

Let's say we have the following list:

states = [' Alabama ', 'Georgia!', 'Georgia', 'georgia', 'south carolina##', 'West virginia?']

Now we will define a function clean_strings()

import re

def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub('[!#?]', '', value)
        value = value.title()
        result.append(value)
    return result

When we call the function clean_strings(states)

The result will look like:

['Alabama',
'Georgia',
'Georgia',
'Georgia',
'Florida',
'South Carolina',
'West Virginia']

score 0 · Answer 9 · answered Jun 20 '19 at 10:11

I am probably late for the answer but i think below code would also do ( to an extreme end) it will remove all the unncesary chars:

a = '; niraj kale 984wywn on 2/2/2017'
a= re.sub('[^a-zA-Z0-9.?]',' ',a)
a = a.replace('  ',' ').lstrip().rstrip()

which will give

'niraj kale 984wywn on 2 2 2017'

Removing unwanted characters from a string in Python

9 Answers9

Linked