-1

So what I am basically trying to do is to read and print each individual line of an RTF file. However, my problem is that with this code that I currently have it seems to do the job up until it reaches line 937. At that point it stops reading lines and gives me this error:

Traceback (most recent call last):
  File "/private/var/mobile/Library/Mobile Documents/iCloud~com~omz-software~Pythonista3/Documents/openFolders.py", line 8, in <module>
    for element in file:
  File "/var/containers/Bundle/Application/8F2965B6-AC1F-46FA-8104-6BB24F1ECB97/Pythonista3.app/Frameworks/Py3Kit.framework/pylib/encodings/ascii.py", line 27, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 4250: ordinal not in range(128)

file = open("Steno Dictionary.rtf", "r")

#line_number is just to know what line number has been printed on the console.  
line_number = 1

for element in file:
    
    #print(line_number) prints until it reaches 937 and then the error occurs. 
    print(line_number)
    print(element)
    line_number +=1 

How would I modify my current code to make it keep on reading lines until the end of the file? As there are still many more lines left. I have searched high and low and cannot seem to figure it out! Thank you very much to whoever can help me out! As a note: I’m using Pythonista on iOS.

Code-Apprentice
  • 81,660
  • 23
  • 145
  • 268
Kiko314
  • 3
  • 1
  • `UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 4250` tells you that there's an unsupported character. Try another encoding, e.g: `file = open("Steno Dictionary.rtf", "r", encoding='utf-8')`. See [list of standard encodings](https://docs.python.org/3/library/codecs.html#standard-encodings) – Tranbi Jun 09 '22 at 11:19
  • Thanks a million! You definitely pointed me in the right direction. The “utf-8” didn’t work. So what I ended up doing was going the the link you provided with the list of standard encodings and tried a few… trying my best to make sense of it, ended up trying the “raw_unicode_escape” and it worked! Now I can read the entire file no problem! – Kiko314 Jun 10 '22 at 00:53
  • Does this answer your question? [How to fix: "UnicodeDecodeError: 'ascii' codec can't decode byte"](https://stackoverflow.com/questions/21129020/how-to-fix-unicodedecodeerror-ascii-codec-cant-decode-byte) – Code-Apprentice Jun 11 '22 at 05:20

1 Answers1

0

The error you are getting means that Python doesn't understand how to translate a specific character in the document using the default text encoding.

There are a few things you can try, the first is to check if explicitly setting the encoding to utf8 works.

file = open("Steno Dictionary.rtf", "r", encoding="utf-8")
...

if that doesn't work you can try to use other encodings or you can tell python to replace the bits it doesn't recognize with something else. like this

file = open("Steno Dictionary.rtf", "r", encoding="utf-8", errors="replace")
...

That will decode everything it knows how to, and replace what it doesn't with ? characters.

Alexander
  • 16,091
  • 5
  • 13
  • 29