0

I was kind of struggling to get python into reading this particular text file.. (fig 1)

I tried some encoding (utf-8, ascii..) But none worked. Then after a while I found the solution in the traceback. (fig 2)

Now my question is how does this result in an error when python is reading the right encoding?

Figure 1:

rel_path = "DIR/text.txt"
print ('Getting data from: ' + rel_path + ': \n')
text_file = open(rel_path)

print (text_file.read())

Figure 2:

File "test.py", line 14, in <module>
print (text_file.read())
File "LOCALDIR\Python\Python35\lib\encodings\cp850.py", line 19, in encode

return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2018' in position 4590: 
character maps to <undefined>

Note the file python reads PYTHONDIR\cp850.py <-

When I add encoding='cp850' when opening the text file it works. (fig 3)

Figure 3:

rel_path = "DIR/text.txt"
print ('Getting data from: ' + rel_path + ': \n')
text_file = open(rel_path, encoding='cp850')

print (text_file.read())
Somaar
  • 135
  • 2
  • 8
  • CP850 ([MS-DOS codepage 850](https://en.wikipedia.org/wiki/Code_page_850)) can decode **any** file, but the result may not be readable. By decoding the file to CP850 you only produced 'text' that your console can print. See the duplicate for solutions to make your console be able to print more than just the characters defined in that codepage. – Martijn Pieters Sep 21 '15 at 18:07
  • In other words, you *still* need to figure out the correct encoding for the input file. It may well be something else than CP850. – Martijn Pieters Sep 21 '15 at 18:07
  • Ah oke I get it, thanks for the answer! – Somaar Sep 22 '15 at 11:58

0 Answers0