1

I have a problem, I am trying to get a string to be equal in Python3 and in MySQL, the problem is I expect it should be utf-8 but the problem is it's not the same.

I have this string

station√¶r pc > station√¶r pc

and what I wish now is it should look like this

stationr pc > stationr pc

and I have tried to use bytes(string, 'utf-8').decode('utf-8') and a lots of other things.

I hope one here can help me to strip all the weird characters out of my strings so I can use it better, the problem is the data coming from external files and I can't control the encoding.

Jason Aller
  • 3,541
  • 28
  • 38
  • 38
ParisNakitaKejser
  • 12,112
  • 9
  • 46
  • 66
  • 1
    Shouldn't this actually be "stationær pc"? This looks exactly like mojibake for interpreting UTF-8 data with the Mac Roman codec. I can reproduce it with `'stationær'.encode('utf8').decode('macroman')`. – lenz Jan 10 '18 at 13:09
  • In general, there's no need to control the encoding of input data. It's important to *know* what encoding was used, then you can always decode accordingly. – lenz Jan 10 '18 at 13:16
  • If you really want to convert "stationær pc" to "stationr pc", you can do `"stationær pc".encode('ascii', errors='ignore').decode('ascii')`. – lenz Jan 10 '18 at 13:20
  • thanks yeah its working this way, need to ignore it by using bytes(cat['Title'],'utf-8').decode('utf8').encode('ascii', errors='ignore').strip() thanks a lot :) will you make a anwser? – ParisNakitaKejser Jan 10 '18 at 13:23
  • I'm sure there are dozens of duplicates of this question, no need for another duplicate answer. Searching for "python remove non-ascii characters", I found [this answer](https://stackoverflow.com/a/18430817/1698431), for example. – lenz Jan 10 '18 at 13:43
  • 1
    Btw, `bytes(x, 'utf8').decode('utf8') == x` for any x, so you can skip that. – lenz Jan 10 '18 at 13:44

1 Answers1

0

As lenz found out, you have "Mojibake" with CHARACTER SET macroman versus utf8.

See this for ways that Mojibake can happen. (It reads "latin1" instead of "macroman".)

Rick James
  • 135,179
  • 13
  • 127
  • 222