0

I'm trying to store Hebrew in a MySQL table - but I end up seeing question marks. I'm using a Python flask server. I believe character sets and collation are set correctly (here is the output for running 'show create table':

CREATE TABLE some_table (some_field varchar(128) COLLATE utf8mb4_unicode_ci DEFAULT NULL) ENGINE=InnoDB AUTO_INCREMENT=19 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

My SQLALCHEMY_DATABASE_URI is:

'mysql://username:password@localhost/database?charset=utf8'

The data which I insert into the table is initially received within a Python view function, and I process it as follows:

data = request.get_json() 
some_field = data['some_field'].encode('utf-8')

I then insert the "some_field" object into the some_field column.

Note: When I try printing the some_field variable to the Python console I also see a gibberish string, but I assume that's because the console itself can't display Hebrew (not sure if this is causing the problem)...

JMS
  • 1,039
  • 4
  • 12
  • 20
  • "I assume that's because the console itself can't display Hebrew" - so don't print the string, print the representation: `':'.join(hex(ord(x))[2:] for x in some_field)` and see if the variable contains what you think it should, and in correct encoding. Also, please tag the post with [tag:python2] or [tag:python3], as it's very relevant when talking about encodings. Also, check whether your web page is in correct encoding as well. – Amadan May 12 '18 at 17:52
  • @Amadan I printed that and got "5e9:5dc:5d5:5dd", which is correct. It's Python2, thanks for pointing that out. The outgoing web request has its encoding set to utf-8. And... using utf8mb4 in the connection string doesn't work either, it causes an error :/ – JMS May 12 '18 at 18:02
  • You got `5e9:5dc:5d5:5dd` before inserting into the database? What did you get after reading from the database? How do you insert? And how do you retrieve? – Amadan May 13 '18 at 01:24
  • @Amadan I read in the incoming data as "data = request.get_json()". When I do "print data", I see {u'some_field': u'\u05e9\u05dc\u05d5\u05dd'}. However, If I try "print data['some_field']" I see gibberish. I insert the data using SQLALCHEMY, I've tried inserting data['some_field'] directly, as well as inserting it after applying decode('utf-8) and/or encode('utf-8') – JMS May 13 '18 at 05:34
  • Btw, when I try decode('utf-8') I get the error "UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)" – JMS May 13 '18 at 05:42
  • 2
    You didn't say, what do you get when you take the data out of the database? (hex dump, so we don't get stuck guessing gibberish :) ) – Amadan May 13 '18 at 16:54
  • Wow, that was a good question, and I wish I had checked that sooner. It turns out that data is being stored as unicode... it's just that the command line renders everything as question marks. This solved the problem for me. Thank you so much! – JMS May 13 '18 at 17:22

4 Answers4

0

When I eventually tried rendering the data directly (rather than just staring it in my mysql console), it turns out that it was stored correctly as unicode. Thank you @Amadan

JMS
  • 1,039
  • 4
  • 12
  • 20
0

Don't use encode or decode. Stick with UTF-8, not Unicode. I predict you will be unhappy with Unicode in the long run. See "question mark" in Trouble with UTF-8 characters; what I see is not what I stored for discussion of likely causes.

Rick James
  • 135,179
  • 13
  • 127
  • 222
0

I'm using MySQLdb with Flask (python), and encountered exactly the same (Hebrew from DB appears as question marks ?????)

solved it with these steps:

  1. [link] - Change character sets for each of the levels: DB, Table and Column!
  2. [link] - SET NAMES 'utf8';
  3. This is what I think made the difference: [link] MySQLdB connection arguments (char set). This is what my (python) connection string looks like now:

    db = MySQLdb.connect(host="username.mysql.pythonanywhere-services.com",  # your host
                         user="username",  # username
                         passwd="yourpassword",  # password
                         db="db(schema)name", # name of the database
                         charset="utf8",
                         use_unicode=True)
    

Notice the charset and use_unicode although I put the default values, this change solved the problem.

I am not sure if all the previous steps and both arguments were necessary.

Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
Yanoom
  • 21
  • 3
0

Try to add collation='utf8_bin' to the column definition. for example, at models.py :

class Users(db.Model):
    __tablename__ = 'Users'
    id = db.Column(db.Integer, primary_key=True)
    full_name = db.Column(db.String(80, collation='utf8_bin'), unique=False, nullable=False)
    email = db.Column(db.String(120), unique=True, nullable=False)