0

I have been working on a file which has lot of puntuations and we need to neglect the puntuations so we can count the actual length of words.

Example:

Is this stack overflow! ---> Is this stack overflow

While doing this I did wrote a lot of cases for each and every punctuation which is there which made my code work slow.So I was looking for some effective way to implement the same using a module or function.

Code snippet :

with open(file_name,'r') as f:
     for line in f:
         for word in line.split():
            #print word
            '''
                Handling Puntuations
            '''
            word = word.replace('.','')
            word = word.replace(',','')
            word = word.replace('!','')
            word = word.replace('(','')
            word = word.replace(')','')
            word = word.replace(':','')
            word = word.replace(';','')
            word = word.replace('/','')
            word = word.replace('[','')
            word = word.replace(']','')
            word = word.replace('-','')

So form this logic I have written this, so is there any way to minimize this?

SaiKiran
  • 6,244
  • 11
  • 43
  • 76

2 Answers2

4

This question is a "classic", but a lot of answers don't work in Python 3 because the maketrans function has been removed from Python 3. A Python 3-compliant solution is:

use string.punctuation to get the list and str.translate to remove them

import string
"hello, world !".translate({ord(k):"" for k in string.punctuation})

results in:

'hello world '

the argument of translate is (in Python 3) a dictionary. Key is the ASCII code of the character, and value is the replacement character. I created it using a dictionary comprehension.

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
2

You can use regular expression to replace from a character class as

>>> import re
>>> re.sub(r'[]!,:)([/-]', '', string)
'Is this stack overflow'
  • []!,:)([/-] A character class which matches ] or ! or , or etc. Replace it with ''.
nu11p01n73R
  • 26,397
  • 3
  • 39
  • 52