Correct decode python string

Question

I am trying to decode the string bellow:

st='arroz e feij\xc3o, bife ao molho de tomate, pts com quiabo\r\nSALADA: alface, r\xdacula, rabanete e cebola\r\nSOBREMESA: ma\xc7\xc3\r\nSUCO: Amarelo 3\r\n o card\xc1pio cont\xc9m gl\xdaten no p\xc3o. n\xc3o cont\xc9m ovos e lactose. traga sua caneca!'

Using:

st.decode(?)

But I don't now the correct codec.

I can't guess encodings on-the-fly, maybe someone else has this fascinating capability. Have you tried `'utf8'`? — ForceBru, Apr 03 '17 at 15:38
If you're using Windows, try `'mbcs'` which takes the current OS code page. — Mark Ransom, Apr 03 '17 at 15:39
@ForceBru you can tell it's not utf8 because the hex bytes are singular. Utf8 will always have 2-4 hex bytes in a row. — Mark Ransom, Apr 03 '17 at 15:40
No, I'm using Linux. This string I got of an internet page using urllib. The correct decode is: 'arroz e feijão, bife ao molho de tomate, pts com quiabo\r\nSALADA: alface, r\xdacula, rabanete e cebola\r\nSOBREMESA: maçã\r\nSUCO: Amarelo 3\r\n o cardápio contêm glúten no pão. não contêm ovos e lactose. traga sua caneca!'. I tried 'utf-8' and 'ascii'. — hildogjr, Apr 03 '17 at 15:55
This is latin1. Check this out: https://www.python.org/dev/peps/pep-0223/. `st.decode('latin1')` gives you decoded unicode string. — Dmitry Shilyaev, Apr 03 '17 at 15:59
I tried this codec too. The only diference is that, when I print the string, star with the "u" indication, but didn't change the answer, still with the "\x**". I could use search "\x**" and change one-by-one with char. But I think that is not the best way. — hildogjr, Apr 03 '17 at 22:11
I found a solution after check the codecs used in [link](https://pypi.python.org/pypi/chardet>). The code that got correct this HTLM parsed text is `tex = tex.decode('windows-1252').lower().encode('utf-8')` — hildogjr, Apr 05 '17 at 02:58

Correct decode python string

0 Answers0