Just going off of what you've written:
filename = ...
with open(filename) as file:
lines = file.readlines()
delimiter_indices = (2, 4, 10, 14) # The indices in any given line where you expect to see semicolons.
for line_num, line in enumerate(lines):
if any(line[index] != ";" for index in delimiter_indices):
print(f"{filename}: Semicolon expected on line #{line_num}")
If the line doesn't have at least 15 characters, this will raise an exception. Also, lines like ;;;;;;;;;;;;;;;
are technically valid.
EDIT: Assuming you have an input file that looks like:
AB;2;43234;343;
CD;4;41234;443;
FE;5;53234;543;
FE;5;53;34;543;
(Note: the blank line at the end)
My provided solution works fine. I do not see any exceptions or Semicolon expected on line #...
outputs.
If your input file ends with two blank lines, this will raise an exception. If your input file contains a blank line somewhere in the middle, this will also raise an exception. If you have lines in your file that are less than 15 characters long (not counting the last line), this will raise an exception.
You could simply say that every line must meet two criteria to be considered valid:
- The current line must be at least 15 characters long (or
max(delimiter_indices) + 1
characters long).
- All characters at delimiter indices in the current line must be semicolons.
Code:
for line_num, line in enumerate(lines):
is_long_enough = len(line) >= (max(delimiter_indices) + 1)
has_correct_semicolons = all(line[index] == ';' for index in delimiter_indices)
if not (is_long_enough and has_correct_semicolons):
print(f"{filename}: Semicolon expected on line #{line_num}")
EDIT: My bad, I ruined the short-circuit evaluation for the sake of readability. The following should work:
is_valid_line = (len(line) >= (max(delimiter_indices) + 1)) and (all(line[index] == ';' for index in delimiter_indices))
if not is_valid_line:
print(f"{filename}: Semicolon expected on line #{line_num}")
If the length of the line is not correct, the second half of the expression will not be evaluated due to short-circuit evaluation, which should prevent the IndexError
.
EDIT:
Since you have so many files with so many lines and so many semicolons per line, you could do the max(delimiter_indices)
calculation before the loop to avoid having calculate that value for each line. It may not make a big difference, but you could also just iterate over the file object directly (which yields the next line each iteration), as opposed to loading the entire file into memory before you iterate via lines = file.readlines()
. This isn't really required, and it's not as cute as using all
or any
, but I decided to turn the has_correct_semicolons
expression into an actual loop that iterates over delimiter indices - that way your error message can be a bit more explicit, pointing to the offending index of the offending line. Also, there's a separate error message for when a line is too short.
import glob
delimiter_indices = (2, 4, 10, 14)
max_delimiter_index = max(delimiter_indices)
min_line_length = max_delimiter_index + 1
for path in glob.glob(r"C:\path\*.txt"):
filename = path.name
print(filename.center(32, "-"))
with open(path) as file:
for line_num, line in enumerate(file):
is_long_enough = len(line) >= min_line_length
if not is_long_enough:
print(f"{filename}: Line #{line_num} is too short")
continue
has_correct_semicolons = True
for index in delimiter_indices:
if line[index] != ";":
has_correct_semicolons = False
break
if not has_correct_semicolons:
print(f"{filename}: Semicolon expected on line #{line_num}, character #{index}")
print("All files done")