0

I am trying to read a tab-separated file and collect all characters except control characters. If a control character is hit, the remainder of line should be ignored too. I've tried the following code in Python 3.5, using a for..else loop:

import curses.ascii

input_file = ...
chars = set()
with open(input_file) as file:
    for line in file.readlines():
        source, target = line.split("\t")

        for c in source.strip() + target.strip():
            if curses.ascii.iscntrl(c):
                print("Control char hit.")
                break
            chars.add(c)
        else:
            print("Line contains control character:\n" + line)
            continue

        print("Line contains no control character:\n" + line.strip())

I'd expect this to check each character for being a control character and if it hits one (break is triggered), skip to the next line, hence trigger the else/continue statement.

What happens instead is that continue is always triggered, even if the break statement in the if clause is never reached for a line. Consequently, the final print statement is never reached either.

What am I doing wrong?

Carsten
  • 1,912
  • 1
  • 28
  • 55
  • 2
    the else is triggered only when break is not triggered. – thebjorn Jun 26 '16 at 17:38
  • 2
    Hmmm, I suggest you read more about the `for...else` in python: [How can I make sense of the `else` statement in Python loops?](http://stackoverflow.com/questions/37642573/how-can-i-make-sense-of-the-else-statement-in-python-loops/37643358#37643358) – Moses Koledoye Jun 26 '16 at 17:41
  • Check this out if it helps - http://stackoverflow.com/questions/9979970/why-does-python-use-else-after-for-and-while-loops – Indra Uprade Jun 26 '16 at 18:04
  • Thanks to all of you, your hints have helped. I find the terminology somewhat confusing though. – Carsten Jun 26 '16 at 18:06
  • If you try to pair `else` with for, it could be confusing. I don't think the keyword else was a great choice for this syntax, But if you pair else with break, you can see it actually makes sense. Let me show how it works in human language. -- `for` each person in a group of suspects `if` anyone is the criminal `break` the investigation. `else` report failure. –  May 31 '18 at 14:51

1 Answers1

2

The else block of a for loop is only executed if the for loop never was interrupted. You'll only see the continue statement in the else block executed if there were no control characters in the line. From the for statement documentation:

When the items are exhausted (which is immediately when the sequence is empty or an iterator raises a StopIteration exception), the suite in the else clause, if present, is executed, and the loop terminates.

A break statement executed in the first suite terminates the loop without executing the else clause’s suite.

A better test to see if there are control characters in a line is to use the any() function with a generator expression:

if any(curses.ascii.iscntrl(c) for c in source.strip() + target.strip()):
    print("Line contains control character:\n" + line)
    continue

or you could use a regular expression; this'll be faster as the looping over text is done in C code without having to box each individual character in a new str object:

import re

control_char = re.compile(r'[\x00-\x31]')

if control_char.search(source.strip() + target.strip()):
    print("Line contains control character:\n" + line)
    continue
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343