Python read a file that has latin-1 supplement encoding

Question

I am reading a HTML document using Python. It has many characters like \x93, \x94, \xa0. I presume they correspond to latin-1 supplement encoding. Is there a library that deals with this?

Can you also post the code you are using and the error you are getting? — ksohan, May 22 '20 at 13:34
maybe you need only `decode('latin1')` or even `open(... ,encoding='latin1')` — furas, May 22 '20 at 13:36
I am not getting any error. When I download the file and read the file in python using ```utf-8``` encoding, and print it, I can see occurences of ```\x93``` etc. I have also tried reading using other encoding schemes — OlorinIstari, May 22 '20 at 13:36
first show url to file which you downloaded. And show code which you use to download it. Usually HTML pages have information about encoding and you don't have to encode it manulally. Next check in Google in which encoding chars have codes `\x93`, `\x94`, `\xa0` - and you will know if it is really `latin1` or something else. — furas, May 22 '20 at 13:40
Using Google I found [Python: Removing \xa0 from string?](https://stackoverflow.com/questions/10993612/python-removing-xa0-from-string). You should learn to use Google before you ask. — furas, May 22 '20 at 13:43

score 0 · Answer 1 · answered May 22 '20 at 13:35

0

You can simply encode and decode strings in latin1 in python: string.decode('latin1')

answered May 22 '20 at 13:35

Kyrylo Kundik

64
3

Hi. Thanks! That seems to work. But I still have occurences of ```\xa0```. By any chance do you know what encoding that comes under? – OlorinIstari May 22 '20 at 13:38
@ShrutheeshRamanIyer to answer in what encoding is `\xa0` we would have to use Google - but you could use Google on your own. – furas May 22 '20 at 13:41
`\xa0` is actually non-breaking space in Latin1 (ISO 8859-1), also `chr(160)`. You should replace it with space. From https://stackoverflow.com/a/11566398/12181022 – Kyrylo Kundik May 22 '20 at 13:42

Python read a file that has latin-1 supplement encoding

1 Answers1