1

I'm new to python, and I'm trying to understand the following line:

    "".join(char for char in input if not unicodedata.category(char).startswith('P'))

Source: https://stackoverflow.com/a/11066443/3818487

This code removes all unicode punctuation from input. I don't understand why it works. As far as I can tell, it just iterates over all characters in input ignoring the punctuation characters. How can it access char before it is declared in the for loop? I come from a java background, so this is very confusing to me.

Community
  • 1
  • 1
alexgbelov
  • 3,032
  • 4
  • 28
  • 42
  • 3
    You could read about 'List Comprehension' in python. That is what is being done here. – Bharat Jun 10 '16 at 21:21
  • how is "removes all unicode punctuation" different from "iterates over all characters in input ignoring the punctuation characters" ? Those seem like the same final result to me. – Tadhg McDonald-Jensen Jun 10 '16 at 21:29
  • Why did you change the identifier `word` in the linked answer to `input` here? `input()` is a built-in function, and shouldn't be masked like that. – MattDMo Jun 10 '16 at 22:27

1 Answers1

2

This comprehension would look more like the following, in regular code (using a list to store our non-punctuation characters).

#input is defined somewhere prior to the loop
output = []
for char in input:
    if not unicodedata.category(char).startswith('P'):
        output.append(char)
''.join(output)

Comprehensions iterate over the loop portion first, with the value being iterated over on the left.