87

I want to remove all punctuation marks from a text file using .translate() method. It seems to work well under Python 2.x but under Python 3.4 it doesn't seem to do anything.

My code is as follows and the output is the same as input text.

import string
fhand = open("Hemingway.txt")
for fline in fhand:
    fline = fline.rstrip()
    print(fline.translate(string.punctuation))
wkl
  • 77,184
  • 16
  • 165
  • 176
cybujan
  • 1,047
  • 2
  • 9
  • 14

6 Answers6

200

You have to create a translation table using maketrans that you pass to the str.translate method.

In Python 3.1 and newer, maketrans is now a static-method on the str type, so you can use it to create a translation of each punctuation you want to None.

import string

# Thanks to Martijn Pieters for this improved version

# This uses the 3-argument version of str.maketrans
# with arguments (x, y, z) where 'x' and 'y'
# must be equal-length strings and characters in 'x'
# are replaced by characters in 'y'. 'z'
# is a string (string.punctuation here)
# where each character in the string is mapped
# to None
translator = str.maketrans('', '', string.punctuation)

# This is an alternative that creates a dictionary mapping
# of every character from string.punctuation to None (this will
# also work)
#translator = str.maketrans(dict.fromkeys(string.punctuation))

s = 'string with "punctuation" inside of it! Does this work? I hope so.'

# pass the translator to the string's translate method.
print(s.translate(translator))

This should output:

string with punctuation inside of it Does this work I hope so
wkl
  • 77,184
  • 16
  • 165
  • 176
  • 7
    This is nicely done. It's unfortunate that the top Google results for this topic are deprecated, slower, or more difficult to follow. – rurp Apr 26 '16 at 05:38
  • 1
    It seems that `string.punctuation` does not include quotes. How would we tweak this code to trim by the keys in `string.punctuation` as well as user specified characters? An or statement? – Arash Howaida Dec 28 '16 at 03:28
  • 1
    @ArashHowaida `string.punctuation` includes quotes (both double and single) - even in my example it strips out the double quotes. If you want to customize what gets stripped in addition to `str.punctuation`, just concatenate `string.punctuation` with a string of characters you also want removed, like `translator = str.maketrans({key: None for key in string.punctuation + 'abc'})` if you wanted to remove punctuation and any occurrences of the characters `a`, `b`, or `c`. – wkl Dec 28 '16 at 05:48
  • My quotes must have some encoding issues, good to know. Thank you! – Arash Howaida Dec 28 '16 at 06:00
  • 1
    `str.maketrans('', '', string.punctuation)` would also work. There is no need to loop, at any rate, even `str.maketrans(dict.fromkeys(string.punctuation))` would be better here. – Martijn Pieters Jan 16 '17 at 19:22
  • @MartijnPieters thanks for the improved versions, I'll update my answer – wkl Jan 16 '17 at 19:44
  • Excellent! Note that I personally prefer the three-argument variant, why create a whole dictionary for `str.maketrans()` to transform into another dictionary; the three-argument version only passing a very cheap empty string (twice) and the pre-existing `string.punctuation` object. – Martijn Pieters Jan 16 '17 at 19:53
  • This method `str.maketrans('', '', string.punctuation)` works fine, but keep in mind that the punctuations will be removed but not replaced with a whitespace if that's what you're looking for. – Cesar Flores Oct 31 '21 at 12:47
25

The call signature of str.translate has changed and apparently the parameter deletechars has been removed. You could use

import re
fline = re.sub('['+string.punctuation+']', '', fline)

instead, or create a table as shown in the other answer.

elzell
  • 2,228
  • 1
  • 16
  • 26
  • 1
    (@birryree example (http://stackoverflow.com/a/34294398/1656850) begs to disagree with your deprecation edict on string.translate ;-) – boardrider Dec 17 '15 at 10:38
  • You are right. I misunderstood the documentation on that point. Only the call signature has changed: str.translate takes only a table as parameter and no longer deletechars (as seen in birryree's answer). I will edit my answer accordingly. – elzell Dec 17 '15 at 10:51
  • This is the only solution I could find that is Python 2.7/3.6 compatible. I could not find any solution to use translate() that would work for both Python 2.7 and 3.6. – proximous May 29 '18 at 14:54
25

In python3.x ,it can be done using :

import string
#make translator object
translator=str.maketrans('','',string.punctuation)
string_name=string_name.translate(translator)
Mayank Kumar
  • 523
  • 6
  • 9
3

I just compared the three methods by speed. translate is slower than re.sub (with precomilation) in about 10 times. And str.replace is faster than re.sub in about 3 times. By str.replace I mean:

for ch in string.punctuation:                                                                                                     
    s = s.replace(ch, "'") 
imbolc
  • 1,620
  • 1
  • 19
  • 32
  • 2
    I think you do it wrong I run tests(adopted in translate test part for python3) from http://stackoverflow.com/a/266162/4249707 on Python 3.6.0b4 and like many years ago replace sucks. My results - sets : 2.7033574236556888 regex : 0.9831533581018448 translate : 1.837449918501079 replace : 5.498765277676284 – El Ruso Feb 07 '17 at 21:05
  • str.translate() = 2.35 seconds, regular expressions = 88.8 seconds, for loop with str.replace() = 20.6 seconds. (https://datagy.io/) – Gospel77 Feb 25 '22 at 11:07
1

Late answer, but to remove all punctuation on python >= 3.6, you can also use:

import re, string

clean_string = re.sub(rf"[{string.punctuation}]", "", dirty_string)

Demo

Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268
0

In Python 3.6 you can use the following to remove punctuation:

import string

your_string.translate(str.maketrans('', '',string.punctuation))

The .maketrans() method takes three arguments - the first two are empty strings, and the third is the list of punctuation we want to remove. This tells the function to replace all punctuation with 'None'.

Additionally, you can view the punctuation attribute that comes with the string library by running:

print(string.punctuation)
Gospel77
  • 133
  • 1
  • 7