2

I have a string with which I would like to remove all punctuation. I currently use:

import string
translator = str.maketrans('','', string.punctuation)
name = name.translate(translator)

However, for strings which are names this removed the hyphen also, which I would like to keep in the string. For Instance '\Fred-Daniels!" Should become "Fred-Daniels".

How can I modify the above code to achieve this?

Christian Dean
  • 22,138
  • 7
  • 54
  • 87
labjunky
  • 831
  • 1
  • 13
  • 22
  • 1
    `string.punctuation` is itself a string. Can't you just remove `-` from `string.punctuation`? – jamesdlin Aug 01 '17 at 01:11
  • Are you looking for a regex solution or any solution? Your tag has regex and there is an answer already that does this using regex. – idjaw Aug 01 '17 at 01:12
  • If you agree with [this](https://stackoverflow.com/questions/21209024/python-regex-remove-all-punctuation-except-hyphen-for-unicode-string) solution that answers your question I will dupe it. Otherwise, the regex tag should be removed and you should specify that you want a non-regex solution just to ensure proper categorization is preserved with the questions. – idjaw Aug 01 '17 at 01:17

3 Answers3

8

If you'd like to exclude some punctuation characters from string.puncation, you can simply remove the ones you don't want considered:

>>> from string import punctuation
>>> from re import sub
>>> 
>>> string = "\Fred-Daniels!"
>>> translator = str.maketrans('','', sub('\-', '', punctuation))
>>> string
'\\Fred-Daniels!'
>>> string = string.translate(translator)
>>> string
'Fred-Daniels'

Note if it's only one or two characters you want to exclude, you should use str.replace. Otherwise, its best to just stick with re.sub.

Christian Dean
  • 22,138
  • 7
  • 54
  • 87
3
import string

PUNCT_TO_REMOVE = string.punctuation
print(PUNCT_TO_REMOVE) # Output : !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

# Now suppose you don't want _ in your PUNCT_TO_REMOVE

PUNCT_TO_REMOVE = PUNCT_TO_REMOVE.replace("_","")
print(PUNCT_TO_REMOVE) # Output : !"#$%&'()*+,-./:;<=>?@[\]^`{|}~
Sachin Rastogi
  • 409
  • 5
  • 8
1

Depending on the use case, it could be safer and clearer to explicitly list the valid characters:

>>> name = '\\test-1.'
>>> valid_characters = 'abcdefghijklmnopqrstuvwxyz1234567890- '
>>> filtered_name = ''.join([ x for x in name if x.lower() in valid_characters ])
>>> print(filtered_name)
test-1

Note that many people have names that include punctuation though, like "Mary St. Cloud-Stevens", "Jim Chauncey, Jr.", etc.

Brian
  • 1,988
  • 1
  • 14
  • 29