How can I convert encoding of special characters in python?

Question

I have have a file includes some sentences. But some of them contains some wired characters (√•, √§, √Ñ), shown below. What are they and is there a way convert them back to normal characters in python?

Thanks,

Examples.

Is there an outdoor grill/bbq place? P√§r

Hej Hur l√•ngt aa√§r de till Stallarna? MVH LAILA

√Ñr d√§r sandstrand och hur l√•ngt

if you know what char should be in place of `√•` then use `text = text.replac("√•", expected_char)`. But maybe this text uses different encoding then you used to decode it - ie. `Latin1`, `Latin2`, `cp1250`, `iso-8859-2`, etc. Maybe if you use different encoding then you get correct chars. — furas, Nov 14 '19 at 19:20
or maybe your system use different UTF-8 encoding. As I know MacOS use little different encoding and it can make problems. BTW: I found this on Stackoverflow: [How to decode these characters? √° √© √≠](https://stackoverflow.com/questions/15283189/how-to-decode-these-characters-%E2%88%9A-%E2%88%9A-%E2%88%9A%E2%89%A0) — furas, Nov 14 '19 at 19:24

furas · Accepted Answer · 2019-11-15T07:02:15.323

It looks like it used wrong encoding - MacRoman - instead of UTF-8. Probably it is MacOS system.

If I encode it (to bytes) using MacRoman and then decode it back to string using utf-8 then I get correct text

text = '''Is there an outdoor grill/bbq place? P√§r

Hej Hur l√•ngt aa√§r de till Stallarna? MVH LAILA

√Ñr d√§r sandstrand och hur l√•ngt'''

text = text.encode('MacRoman').decode('utf-8') 
print(text)

Result:

Is there an outdoor grill/bbq place? Pär

Hej Hur långt aaär de till Stallarna? MVH LAILA

Är där sandstrand och hur långt

Tested on Linux Mint 19.2, Python 3.7

How can I convert encoding of special characters in python?

1 Answers1

Linked