1

How do I properly encode something using utf8mb4 in Python?

I am currently trying to migrate my data from Parse onto my own MySQL DB. For one field, on phpmyadmin, I have utf8mb4_unicode_ci as the collation. However, in uploading an emoji in unicode ('\xF0\x9F\x8C\x83') which is the result of:

message = MySQLdb.escape_string(unicode(xstr(data.get('message'))).encode('utf-8'))

where xstr() is:

def xstr(s):
    if s is None:
        return ''
    return s

However, I get an error:

Warning: Incorrect string value: '\xF0\x9F\x8C\x83' for column 'message' at row 1...

I have also tried not using .encode('utf-8'), not using unicode(), etc. Seemingly all combinations. I am now thinking I need to encode the emoji string in utf8mb4 - anybody know how to do this?

user3781236
  • 728
  • 2
  • 9
  • 23
  • Why are you encoding ? – Ignacio Vazquez-Abrams Oct 23 '14 at 03:09
  • @IgnacioVazquez-Abrams because of the ut8mb4_unicode_ci collation - if I don't, I get `UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1` – user3781236 Oct 23 '14 at 03:16
  • Have you followed all the steps mentioned here: [How to support full Unicode in MySQL databases](https://mathiasbynens.be/notes/mysql-utf8mb4)? I am no expert in this but base64 is an easier way to store Emojis I think. – Ashwini Chaudhary Oct 23 '14 at 03:17
  • @AshwiniChaudhary issue is i need to store more than emojis in this field in the DB - emojis, text, text with accents all need to be stored here – user3781236 Oct 23 '14 at 03:20
  • @user3781236 Base64 should work fine for both text + Emojis. – Ashwini Chaudhary Oct 23 '14 at 03:39
  • 1
    Can you show exactly what call produces the error? What mysql calls utf8mb4 is actually just utf8 - mysql calls a subset of valid utf8 byte sequences utf8, for historical reasons (mysql utf8 uses only three bytes but can only represent the BMP). This kind of problem is hard to diagnose even with full code access, let alone from partial information. – Peter DeGlopper Oct 23 '14 at 03:43
  • 1
    Also note that collations and character sets are not quite the same thing. Check the character set of the column - I always have to google exactly how to see this, but http://stackoverflow.com/a/1049958/2337736 looks like a good answer. – Peter DeGlopper Oct 23 '14 at 04:08

0 Answers0