You can't read from and write to the same file simultaneously. When you open a file with mode r+
, the I/O pointer is initially at the beginning but reading will push it to the end (as explained in this answer). So in your case, you read the first line of the file, which moves the pointer to the end of the file. Then you write out that line (unless it's all whitespace) but crucially, the pointer stays at the end. That means on the next iteration of the loop you will have reached the end of the file and your program stops.
To avoid this, read in all the contents of the file first, then loop over them and write out what you want:
file_data = Path('MICnew.txt').read_text()
with open('MICnew.txt', 'w') as out_handle: # THIS WILL OVERWRITE THE FILE!
for line in file_data.splitlines():
if not line.isspace():
file.write(line)
But that double loop is a bit clumsy and you can instead combine the two steps into one:
with open('MIC.txt', errors='ignore') as oldfile,
open('MICnew.txt', 'w') as newfile:
for line in oldfile:
clean_line = re.sub(r'[^\x00-\x7f]', ' ', line.strip('\x0c'))
if not clean_line.isspace():
newfile.write(clean_line)
In order to remove non-Unicode characters, the file is opened with errors='ignore'
which will omit the improperly encoded characters. Since the sample file contains a number of rogue form feed characters throughout, it explicitly removes them (ASCII code 12 or \x0c
in hex).