2

I'm working on a project that involves parsing pages of text. I've written the following function to remove certain punctuation from a word and convert it to lowercase:

def format_word(word):
    return word.replace('.', '').replace(',', '').replace('\"', '').lower()

Is there any way to combine all of the calls to .replace() into one? This looks rather ugly the way it is! The only way I can think of doing it is as follows:

def format_word(word):
    for punct in '.,\"':
        word.replace(punct, '')
    return word.lower()
falsetru
  • 357,413
  • 63
  • 732
  • 636
Ryan
  • 7,621
  • 5
  • 18
  • 31
  • 1
    unrelated: you don't need to escape `"` inside `'` string literals. – jfs Jan 06 '15 at 08:41
  • related: [Best way to strip punctuation from a string in Python](http://stackoverflow.com/q/265960/4279) – jfs Jan 06 '15 at 08:45
  • related: [Remove punctuation from Unicode formatted strings](http://stackoverflow.com/q/11066400/4279) – jfs Jan 06 '15 at 08:45

4 Answers4

8

You can use str.translate if you want to remove characters:

In python 2.x:

>>> 'Hello, "world".'.translate(None, ',."')
'Hello world'

In python 3.x:

>>> 'Hello, "world".'.translate(dict.fromkeys(map(ord, ',."')))
'Hello world'
glglgl
  • 89,107
  • 13
  • 149
  • 217
falsetru
  • 357,413
  • 63
  • 732
  • 636
  • upvote. You could also emulate `.lower()` in the same `.translate()` call. – jfs Jan 06 '15 at 08:42
  • [code example that shows how to use `translate()` to emulate `lower()` for ascii data](http://ideone.com/VpW59X) – jfs Jan 06 '15 at 08:59
4

You can use the re module for that

import re
>>> def format_word(word):
...     return re.sub(r'[,."]', "", word)
...
>>> print format_word('asdf.,"asdf')
asdfsdf
nu11p01n73R
  • 26,397
  • 3
  • 39
  • 52
0

You are quite close. If you don't only call .replace(), but as well use its result, you are done:

def format_word(word):
    for punct in '.,\"':
        word = word.replace(punct, '')
    return word.lower()
glglgl
  • 89,107
  • 13
  • 149
  • 217
0

You can do this using regular expressions:

re.sub("[.,\"]", "", "\"wo,rd.")
bigblind
  • 12,539
  • 14
  • 68
  • 123