0

I have this code:

 def remove_punctuation(self,text):
        exclude = set(string.punctuation)
        a=''.join(ch for ch in text if ch not in exclude)
        return ''.join(c for c in a if not ud.category(c).startswith('P'))

First I would like to know what this does :

ch for ch in text if ch not in exclude

How is it possible to write a for loop like that?

second, I want to replace those punctuation let's say in a text like this : "hello_there?my_friend!" with a space using the above code. How can I change that code to do that?

John Sall
  • 1,027
  • 1
  • 12
  • 25
  • What does *"those punctuation"* mean? – Austin May 24 '19 at 18:17
  • 1
    You can read about [list comprehensions](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions) to understand what the line `ch for ch in text if ch not in exclude` does. Basicly: it removes all chars that are NOT in `exclude` – Ralf May 24 '19 at 18:17
  • @Austin I edited the post – John Sall May 24 '19 at 18:19
  • 1
    That's not essentially a list comprehension (but a generator). Nevertheless, a read on it would help you understand what's going on. Between a list comprehension is more performant than generator comprehension with `join`. – Austin May 24 '19 at 18:25

2 Answers2

1

The piece of code:

a = ''.join([ch for ch in text if ch not in exclude])

is equivalent to

string_without_punctuation = ''
exclude = set(string.punctuation) # =set('!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~')
for character in text:
    if character not in exclude:
        string_without_punctuation += character

You could simply do this to replace the punctuation with spaces:

string_without_punctuation = ''
exclude = set(string.punctuation) # =set('!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~')
for character in text:
    if character not in exclude:
        string_without_punctuation += character
    else:
        string_without_punctuation += ' '
Simon
  • 5,464
  • 6
  • 49
  • 85
1

I'd recommend using str.translate instead of manually rebuilding the string. Make a lookup table mapping characters to the strings you want to replace them with.

trans = str.maketrans(dict.fromkeys(string.punctuation, ' '))

"hello_there?my_friend!".translate(trans)
# 'hello there my friend '
Patrick Haugh
  • 59,226
  • 13
  • 88
  • 96
  • Does it also remove unicode punctuation? because the above code i posted does – John Sall May 24 '19 at 18:33
  • @JohnSall No, I must have missed that. `translate` can still be used, but you might have even more success with a regular expression. See this answer: https://stackoverflow.com/questions/11066400/remove-punctuation-from-unicode-formatted-strings – Patrick Haugh May 24 '19 at 18:40