1

As the question described, I want to read a text file character by character.

I have a large file that is mostly text but also contains some illegitimate bytes that is not accepted by Python, currently I don't have the time to figure out what is actually wrong, so I just want to skip all the problematic bytes using try.

with open(filein,'r',encoding='ascii') as file:
    while True:
        try:
            char = file.read(1)
        except UnicodeDecodeError:
            continue

        if not char:
            break

        print(char)

However this doesn't work as it just skip over all the bytes and outputs nothing.

My instinct thinks that it's because everytime I call READ it reads the file entirely before cropping it, and considers it as an Error.

So I was wondering if there is a way to literally read a single char out of a file in Python, kinda like fgetc() in C?

martineau
  • 119,623
  • 25
  • 170
  • 301
Steiner
  • 11
  • 1
  • why is there the `if not char: break`? – Meccano Jun 16 '20 at 14:33
  • Without the break, this code will run forever. can you elaborate what you consider an illegal byte? – Rafael W. Jun 16 '20 at 14:39
  • @Rafael They're using ASCII, so that means any byte with more than 7 bits is illegal. – wjandrea Jun 16 '20 at 14:44
  • 1
    @poke Actually with a file in text mode, `read(1)` reads a single *character*, though that wasn't reflected [in the tutorial](https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects) until recently. I actually [submitted that correction myself](https://github.com/python/cpython/pull/13852). Either way, ASCII is a single-byte encoding, so it's a moot point. – wjandrea Jun 16 '20 at 15:19

0 Answers0