0

I am working on a Python script that needs to read data from a file containing non-ASCII characters. However, when I run my script, I encounter the following error message:

"UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 10: invalid continuation byte"

I have tried to specify the encoding of the file as "utf-8" using the following code:

with open('data.txt', 'r', encoding='utf-8') as f:
    data = f.read()

Unfortunately, this still doesn't seem to work.

My expected outcome is to be able to read the data from the file without any errors and handle non-ASCII characters correctly.

Any help and suggestions would be greatly appreciated.

Edit: data.txt is as follows: (for my french assignment)

Bonjour, comment ça va ?
Je suis en train d'apprendre le français.
J'aime bien écouter de la musique française.
Ça fait longtemps que je n'ai pas mangé de croissants frais.
Il y a beaucoup de sites web en français.
Je vais prendre un café au lait s'il vous plaît.
Les macarons sont délicieux.
Je rêve de visiter la Tour Eiffel un jour.
Le vin français est très bon.
  • Please correct your code indentation and provide content in `data.txt` which causes `UnicodeDecodeError`. – cup11 Apr 15 '23 at 04:11
  • don't specify encoding method in open method – TanjiroLL Apr 15 '23 at 04:18
  • @coder00 Okay, but may I ask why not? – Joshua Rose Apr 15 '23 at 04:19
  • It seems the encoding of `data.txt` doesn't match `utf-8`. You should either change `data.txt` into `utf-8` encoded (some editors can do it) or use default encoding of system in Python (not specifying `encoding` parameter in `open`). – cup11 Apr 15 '23 at 04:19
  • Provide [Minimal Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example)! The given code works as required with the text provided – Bibhav Apr 15 '23 at 04:28
  • "I have tried to specify the encoding of the file as "utf-8" using the following code" - the error message *just told you* that UTF-8 encoding *will not work*. In order to use the file, you must know *what encoding it actually uses*, and specify that. – Karl Knechtel Apr 15 '23 at 04:54
  • @cup11 leaving it as the default encoding will only work if the file is written with that encoding, in the same way that specifying UTF-8 will only work if the file is UTF-8. In order to solve the problem properly, *the encoding must be known*. – Karl Knechtel Apr 15 '23 at 04:55

0 Answers0