0

In my calling MySQL from Python I prepare it with "SET NAMES 'utf8'", but still something is not right. I get a sequence like this:

å½å®¶1级è¯ä¹¦

When I am supposed to get chinese characters, elsewhere always covered by utf8.

When I look at the utf8 code/sequence it clearly doesn't match the real one. Same sort of format, but different numbers.

Is this erroneous encoding on Python 2.7's end or bad programming on my end? I know Python 3.x has solved these issues but I cannot use the modules I want in later versions.

I know Python 2.7 can actually display chinese, by using the print operator, but it is otherwise stored and viewed as utf8-code. Look:

>>> '你好'

'\xc4\xe3\xba\xc3'

>>> print '\xc4\xe3\xba\xc3'

你好 
user1597652
  • 175
  • 1
  • 7
  • 1
    First check that the MySQL database actually contains correctly encoded data: `SELECT HEX(my_column) FROM my_table WHERE ...` – eggyal Jan 14 '13 at 05:28
  • The format is certainly hex and has the same basic structure seen in other coding schemes, but adding "SET NAMES 'utf8'" in the MySQL part of the code only retrieved a mess of chars. Adding charset='utf8' however in the connector of the MySQL module, did the trick. There you go, but now I have an incompatibility between my (very extensive) preloaded dictionary and the fetched MySQL-data. Are there double chinese intervals in the utf8 table standard? Or might it be Kanji, which is a subset of Hanzi (chinese). Or is the problem that 'print' translates what ever (not utf8) format this might be? – user1597652 Jan 14 '13 at 06:36
  • Ok, this might help for anyone still listening: "print '\xb9\xfa' " and " print u'\u56fd' " both yield the same char, but the u'...' means its in unicode format, right? So the other is utf8? Would anyone know how to convert between these? – user1597652 Jan 14 '13 at 06:52

1 Answers1

0

Ok.. It seems adding

"SET NAMES 'gbk'"

before the MySQL SELECT query did the trick. Now at least the strings from my dictionary and from the sql database can be compared. It also seems that gbk is often the prefered char format in China.

user1597652
  • 175
  • 1
  • 7