1

I'm using Python v2.6 and I have a string which contains a number of punctuation characters I'd like to strip out. Now I've looked at using the string.punctuation() function but unfortunately, I want to strip out all punctuation characters except fullstops and dashes. In total, there are only a total of 5 punctuation marks I'd like to strip out - ()\"'

Any suggestions? I'd like this to be the most efficient way.

Thanks

Ocaso Protal
  • 19,362
  • 8
  • 76
  • 83
thefragileomen
  • 1,537
  • 8
  • 24
  • 40

6 Answers6

1

Using string.translate:

s = ''' abc(de)f\gh"i' '''
print(s.translate(None, r"()\"'"))
 # abcdefghi 

or re.sub:

import re
re.sub(r"[\\()'\"]",'',s)

but string.translate appears to be an order of magnitude faster:

In [148]: %timeit (s*1000).translate(None, r"()\"'")
10000 loops, best of 3: 112 us per loop

In [146]: %timeit re.sub(r"[\\()'\"]",'',s*1000)
100 loops, best of 3: 2.11 ms per loop
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
1
>>> import re
>>> r = re.compile("[\(\)\\\\'\"]")
>>> r.sub("", "\"hello\" '(world)'\\\\\\")
'hello world'
cha0site
  • 10,517
  • 3
  • 33
  • 51
  • This doesn't remove backslashes from the original string. `r.sub('', 'a\\b') --> 'a\\b'` – Andrew Clark Jan 13 '12 at 22:25
  • Right, there are a lot of answers on this one, but I think a compiled regexp would be the most *efficient* solution. And always remember, “Give a man a regular expression and he’ll match a string… but by teaching him how to create them, you’ve given him enough rope to hang himself” – cha0site Jan 13 '12 at 22:29
  • @Paulo: I don't like raw strings, they remind me of paths in Windows ;) – cha0site Jan 13 '12 at 22:31
  • @Paulo: I find that it depends on how much backslashes you need. `r"C:\Program Files\SomeCompany\SomeProgram Version 7\Internals\foobar.bla"` is obviously much nicer than the alternative, but a lot of times I need `"\t\n\0\xad\xde\xef\xbe"`, too... – cha0site Jan 13 '12 at 22:42
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/6710/discussion-between-cha0site-and-paulo-scardine) – cha0site Jan 13 '12 at 22:56
1

You can use str.translate(table[, deletechars]) with table set to None, which will result in all characters from deletechars being removed from the string:

s.translate(None, r"()\"'")

Some examples:

>>> "\"hello\" '(world)'".translate(None, r"()\"'")
'hello world'
>>> "a'b c\"d e(f g)h i\\j".translate(None, r"()\"'")
'ab cd ef gh ij'
Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
1

You could make a list of all the characters you don't want:

unwanted = ['(', ')', '\\', '"', '\'']

Then you could make a function strip_punctuation(s) like so:

def strip_punctuation(s): 
    for u in unwanted: 
        s = s.replace(u, '')
    return s
tlehman
  • 5,125
  • 2
  • 33
  • 51
0

You can create a dict of all the characters you want to be replaced and replace them with char of your choice.

char_replace = {"'":"" , "(":"" , ")":"" , "\":"" , """:""}

for i,j in char_replace.iteritems():
        string = string.replace(i,j)
RanRag
  • 48,359
  • 38
  • 114
  • 167
0
my_string = r'''\(""Hello ''W\orld)'''
strip_chars = r'''()\'"'''

using comprehension:

''.join(x for x in my_string if x not in strip_chars)

using filter:

''.join(filter(lambda x: x not in strip_chars, my_string))

output:

Hello World
Corey Goldberg
  • 59,062
  • 28
  • 129
  • 143