9

I need to call MySQL stored procedure from my python script. As one of parameters I'm passing a unicode string (Russian language), but I get an error;

UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: ordinal not in range(256)

My script:

  self.db=MySQLdb.connect("localhost", "usr", "pass", "dbName")
  self.cursor=self.db.cursor()
  args=("какой-то текст") #this is string in russian
  self.cursor.callproc('pr_MyProc', args)
  self.cursor.execute('SELECT @_pr_MyProc_2') #getting result from sp
  result=self.cursor.fetchone()
  self.db.commit()

I've read that setting charset='utf8' shuld resolve this problem, but when I use string:

self.db=MySQLdb.connect("localhost", "usr", "pass", "dbName", charset='utf8')

This gives me another error;

UnicodeEncodeError: 'utf-8' codec can't encode character '\udcd1' in position 20: surrogates not allowed

Also I've trying to set parametr use_unicode=True, that's not working.

worldofjr
  • 3,868
  • 8
  • 37
  • 49
Gleb
  • 1,412
  • 1
  • 23
  • 55

8 Answers8

5

More things to check on: http://mysql.rjweb.org/doc.php/charcoll#python

Likely items:

  • Start code file with # -*- coding: utf-8 -*- -- (for literals in code)
  • Literals should be u'...'

Can you extract the HEX? какой-то текст should be this in utf8: D0BA D0B0 D0BA D0BE D0B9 2D D182 D0BE D182 20 D0B5 D0BA D181 D182

Rick James
  • 135,179
  • 13
  • 127
  • 222
3

Here are some thoughts. Maybe not a response. I've been playing with python/mysql/utf-8/unicode in the past and this is the things i remember:

Looking at Saltstack mysql module's comment :

https://github.com/saltstack/salt/blob/develop/salt/modules/mysql.py#L314-L322

# MySQLdb states that this is required for charset usage
# but in fact it's more than it's internally activated
# when charset is used, activating use_unicode here would
# retrieve utf8 strings as unicode() objects in salt
# and we do not want that.
#_connarg('connection_use_unicode', 'use_unicode')
connargs['use_unicode'] = False
_connarg('connection_charset', 'charset')

We see that to avoid altering the result string the use_unicode is set to False, while the charset (which could be utf-8) is set as a parameter. use_unicode is more a 'request' to get responses as unicode strings.

You can check real usage in the tests, here: https://github.com/saltstack/salt/blob/develop/tests/integration/modules/test_mysql.py#L311-L361 with a database named '標準語'.

Now about the message UnicodeEncodeError: 'utf-8' codec can't encode character '\udcd1' **. You are using **unicode but you tell the module it is utf-8. It is not utf-8 until you encode your unicode string in utf-8.

Maybe you should try with:

args=(u"какой-то текст".encode('utf-8'))

At least in python3 this is required, because your "какой-то текст" is not in utf-8 by default.

regilero
  • 29,806
  • 6
  • 60
  • 99
  • Link is now broken: I found the new link: https://github.com/saltstack/salt/blob/develop/tests/integration/modules/test_mysql.py#L311-L361 (Note file_name change) – Brandon Deo Aug 08 '17 at 19:31
2

The MySQLdb module is not compatible with python 3. That might be why you are getting problems. I would advise to use a different connector, like PyMySQL or mysqlclient.

Related: 23376103.

Community
  • 1
  • 1
Lucas Moeskops
  • 5,445
  • 3
  • 28
  • 42
2

Maybe you can reload your sys in utf-8 and try to decode the string into utf-8 as following :

import sys
reload(sys)
sys.setdefaultencoding("utf-8")

...

stringUtf8 = u''.join(string_original).decode('utf-8')
Floran Gmehlin
  • 824
  • 1
  • 11
  • 34
2

I had a similar problem very recently but with PostgreSQL. After trying tons of suggestions from SO/ internet, I realized the issue was with my database. I had to drop my database and reinstall Postgres, because for some reason it was not allowing me to change the database's default collation. I was in a hurry so couldn't find a better solution, but would recommend the same, since I was only starting my application in the deployment environment. All the best.

Swakeert Jain
  • 776
  • 5
  • 16
1

What's your database's charset?
use :

show variables like "characetr%"; 

or see your database's charset

JonahCui
  • 125
  • 1
  • 5
1

I see here two problems.

  1. You have unicode but you try to define it as utf-8 by setting parameter "charset". You should first encode your unicode to utf-8 or another encoding system.

  2. If it however doesn't work, try to do so with init_command='SET NAMES UTF8' parameter.

So it will look like:

conn = MySQLdb.connect(charset='utf8', init_command='SET NAMES UTF8')

You can try also this:

cursor = db.cursor()

cursor.execute("SET NAMES UTF8;")
vlad.rad
  • 1,055
  • 2
  • 10
  • 28
0

I encountered a similar issue, which was caused by invalid utf-8 data in the database; it seems that MySQL doesn't care about that, but Python does, because it's following the UTF-8 spec, which says that:

  • surrogate pairs are not allowed in utf-8
  • unpaired surrogates are not allowed in utf-8

If you want to "make it work", you'll have to intercept the MySQL packet and use your own converter which will perform ad-hoc replacements.

Here's one way to "handle" invalid data containing surrogates:

def borked_utf8_decode(data):
    """
    Work around input with unpaired surrogates or surrogate pairs,
    replacing by XML char refs: look for "&#\d+;" after.
    """
    return data.decode("utf-8", "surrogatepass") \
     .encode("utf-8", "xmlcharrefreplace") \
     .decode("utf-8")

Note that the proper way to handle that is context-dependent, but there are some common replacement scenarios, like this one.

And here's one way of plugging this into pymysql (another way is to monkey-patch field processing, see eg. https://github.com/PyMySQL/PyMySQL/issues/631):

import pymysql.converters

# use this in your connection
pymysql_use_unicode = False
conversions = pymysql.converters.conversions
conversions[pymysql.converters.FIELD_TYPE.STRING] = borked_utf8_decode
conversions[pymysql.converters.FIELD_TYPE.VAR_STRING] = borked_utf8_decode
conversions[pymysql.converters.FIELD_TYPE.VARCHAR] = borked_utf8_decode
cJ Zougloub
  • 1,484
  • 10
  • 19