2

I have python code that loops over a file. I'm getting a UTF-8 error (invalid continuation byte) when I read over the file. I just want my program to ignore that.

I've tried using a try except around the code inside, but that won't work since the error is in the condition of the for loop. I've also tried using a try except around the loop but then when it catches the error it doesn't start the loop again.

with open(input_file_path, "r") as input_file:
    for line in input_file:
        # code irrelevant to question

What happens is it gives this error on for line in input_file:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 5: invalid continuation byte`

I want it to skip that line and move to the next one. Essentially, a try catch on the condition of my for loop.

Sheshank S.
  • 3,053
  • 3
  • 19
  • 39

3 Answers3

3

Does this work? (edited to solution OP found)

with open(input_file_path, "r", encoding="utf8", errors="surrogateescape") as input_file:
    for line in input_file:
        try:
            yourcode
        except:
            continue
chk
  • 58
  • 4
1

Have you tried something like this, when the UnicodeDecoceError is raised, the loop will continue with the next iteration.

with open(input_file_path, "rb") as input_file:
    for line in input_file:
        try:
            line_i = line.decode(encoding='utf-8')
        except UnicodeDecodeError:
            continue
BramAppel
  • 1,346
  • 1
  • 9
  • 21
0

You can use

with open(input_file_path, "r", encoding="ISO-8859-1") as input_file:
    for line in input_file:
Sudhansu Kumar
  • 129
  • 1
  • 3
  • 10