Python remove punctuation from text

Question

I am reading a thousand line Italian text and creating a dictionary of unique words. I have tried two methods of removing the punctuation: using string

for p in string.punctuation:
     word = word.replace(p, str())

or :

for line in f:
    for word in line.split():
        stripped_text =""
        for char in word:
            if char in '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~>><<<<?>>?123456789':
               char = ''
               stripped_text += char

My problem is that this still contains punctuation:

{'<<Dicerolti': 1,'piage>>.': 1,'succia?>>.': 1,…}

Any ideas, please?

Sorry the returned dictionary did not come out correctly: {'<>.': 1, 'Nacqui': 1, 'angelo': 1, 'condotta.': 1, 'i': 258, 'voi': 91, 'digiunto.': 1, 'quei:': 1, 'porta.': 2, 'porta,': 5, 'via.': 2, 'consorto': 1, 'via,': 14, 'fosca,': 1, 'vince': 10, 'Lancialotto': 1, 'fosca!': 1, 'vinci': 2, 'voi?>>;': 1, — user1478335, Nov 07 '13 at 15:03
http://stackoverflow.com/questions/265960/best-way-to-strip-punctuation-from-a-string-in-python — jbat100, Nov 07 '13 at 15:36
Thank you for this. I have looked at the solutions in your reference, but I am somewhat lost. I am wondering whether the specific punctuation that is not removed is 'peculiar' to Italian, particularly << and >>. These replace " and " in English. I tried word.translate(None, string.punctuation), but get a Type Error. TAkes one argument, two given. Also in the dictionary above porta returns four times , once porta; and then porta:, porta. and porta,. So my argument falls away rather. Need more help if possible, please — user1478335, Nov 07 '13 at 16:48

score 1 · Accepted Answer · answered Nov 07 '13 at 17:06

You could use the re module for this and a little printf style trick to build a regex that flags any punctuation for replacement.

import string
import re
a = '>>some_crazy_string..!'
print re.sub('[%s]' % string.punctuation,'',a)

prints out

somecrazystring

I've used this trick a couple of times for 'anonymizing' log files.

Python remove punctuation from text

1 Answers1