0

So I have a list of words where I want to remove all punctuation. Here is my code

def removePunctuation(words):
    return set([s.translate(None, string.punctuation) for s in words])

wordsStripped = removePunctuation(words)

I am getting the following error

TypeError: translate() takes exactly one argument (2 given)

I've veen trying a few different ways to do this but with no luck, there's surely an easier way to do this? I'm new to python so excuse me if this is a bad question, any help would be greatly appreciated.

Dan Murphy
  • 225
  • 1
  • 5
  • 15
  • 5
    I suggest that you read the documentation for translate(): https://docs.python.org/3/library/stdtypes.html#str.translate – Code-Apprentice Oct 23 '18 at 14:45
  • 1
    You probably want to build a translation table with `str.maketrans` first. See: https://stackoverflow.com/questions/41535571/how-to-explain-the-str-maketrans-function-in-python-3-6/41536036#41536036 – Patrick Haugh Oct 23 '18 at 14:49
  • I think he got it from https://stackoverflow.com/questions/265960/best-way-to-strip-punctuation-from-a-string-in-python which is appears to be a very respected answer. – UtahJarhead Oct 23 '18 at 14:52
  • 1
    I think you're confusion derives from using python 2 syntax with python 3 – Chris_Rands Oct 23 '18 at 14:53
  • Tested @Chris_Rands statement and I'm sure he's right. The example @OP went with was written for python 2, but he's got `python-3.x` tagged. – UtahJarhead Oct 23 '18 at 14:56
  • 1
    The accepted answer in the dupe appears to me to use the exact same form of `translate()` used in the OP here which is from Python 2. The error that the OP asks about here indicates that they are using Python 3. The signature for this function has changed. This dupe seems less than helpful for the current question. – Code-Apprentice Oct 24 '18 at 07:56
  • The best way to make the required translation table to remove ASCII punctuation in Python 3 is to do `table = str.maketrans('', '', string.punctuation)`, as shown in krinker's answer in the linked duplicate target question page. – PM 2Ring Oct 24 '18 at 08:16

2 Answers2

4
import string    

trans_table = str.maketrans("", "", string.punctuation
def removePunctuation(words):
    return set([s.translate(trans_table) for s in words])

wordsStripped = removePunctuation(words)
iDrwish
  • 3,085
  • 1
  • 15
  • 24
  • 1
    `string.maketrans` was deprecated in Python 3.1, and removed in more recent versions. You should use `str.maketrans` instead. The 3 argument version is the most readable in my opinion: `str.maketrans('', '', string.punctuation)` – Patrick Haugh Oct 23 '18 at 14:58
  • Looked up what @PatrickHaugh said and it makes sense to me. `The string.maketrans() function is deprecated and is replaced by new static methods, bytes.maketrans() and bytearray.maketrans(). This change solves the confusion around which types were supported by the string module. Now, str, bytes, and bytearray each have their own maketrans and translate methods with intermediate translation tables of the appropriate type.` found at https://docs.python.org/3/whatsnew/3.1.html – UtahJarhead Oct 23 '18 at 15:00
  • In this example, `maketrans()` is unnecessary. You can use the dict comp directly. – Code-Apprentice Oct 23 '18 at 15:02
  • Note that `None` can be used in place of `""` in `trans_table`. – Code-Apprentice Oct 24 '18 at 08:03
1

You could also just do this:

words_stripped = ''.join(c for c in s if not c in string.punctuation)

Disclaimer: The code below is using Python 2 syntax in an IPython shell - the string.translate function seems to have changed in Python 3 - your above solution was for Python 2.

Addressing timing as mentioned by @Chris_Rands in the comment to this answer:

In [17]: %timeit s.translate(None, string.punctuation)
100000 loops, best of 3: 15.6 µs per loop

In [18]: %timeit ''.join(c for c in s if not c in string.punctuation)
1000 loops, best of 3: 1.04 ms per loop

In [19]: %timeit ''.join(c for c in s if not c in punctuation_set)
1000 loops, best of 3: 632 µs per loop

This was done with s set to a 5 paragraphs generated here: https://www.lipsum.com/feed/html

So, yes, the translate method is by far the fastest. At the same time... depending on how many times you need to do this, you don't really need to worry about this.

Use the simplest approach you can think of and then use a profiling tool (CProfiler) to figure out where exactly your bottleneck is if your script isn't fast enough.

Daren Thomas
  • 67,947
  • 40
  • 154
  • 200
  • You could do this and it will work in python 3, but it will be slower than using `str.translate()`, even if this is optimized by making `punctuation` a set – Chris_Rands Oct 23 '18 at 14:59