0

Sorry I've seen the question answered before but can't seem to solve the problem. I'm trying to properly encode my SQL columns so it will accept emojis:

def create_table():
    op = '''CREATE TABLE conversation (
    text TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL,
    intent TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_bin,
    entities TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_bin,
    reply TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL
    )'''
    cursor.execute(op)

def insert_message(text, intent, entities, reply):
    try:
        #avoid using placeholders to prevent SQL injection, will pass text, intent and entities to transaction_builder() instead
        sql = """INSERT INTO conversation  (text, intent, entities, reply) VALUES (%s, %s, %s, %s)"""
        transaction_builder(sql, text, intent, entities, reply)
    except Exception as e:
        print ('error is', str(e))

with open('C:/Users/ELITEBOOK/documents/github/chatbot/chatbot/bot/robot_text.txt','r') as table:
        for robo_line in table.readlines():
            message_reply = robo_line
            message_intent = ''
            message_entities = ''
            insert_message(message_text, message_intent, message_entities, message_reply)

There's some code missing but this is what is relevant. It looks to me like the encoding should be set but it seems like it's not. Sorry if this is a duplicate

edit:

Traceback

1366 (HY000): Incorrect string value: '\xF0\x9F\x98\x81\xF0\x9F...' for column 'reply' at row 1
Amon
  • 2,725
  • 5
  • 30
  • 52
  • So what happens when you execute your code? Do you get an error while the SQL query happens? – amanb Apr 08 '18 at 04:26
  • Yes when I run this code in my shell I get the error while trying to insert into the database with `insert_message` – Amon Apr 08 '18 at 04:28
  • Please can you share the full Traceback error message? You may add it to the question for readability purposes. Don't forget to format it as a code block. – amanb Apr 08 '18 at 04:29
  • Just made the edit, it seemed to me like the columns weren't set to the proper encoding. And I thought I fixed it. I'm taking lines from a dataset I found on Kaggle, it happened to have a lot of emojis in it – Amon Apr 08 '18 at 04:32
  • This [SO answer](https://stackoverflow.com/questions/34165523/error-1366-hy000-incorrect-string-value-xf0-x9f-x98-x9c-for-column-comme?rq=1) might be useful. – amanb Apr 08 '18 at 04:55
  • I looked at that one too, I checked my table and `character_set_result`, `connection_set_client` and `character_set_result` are all set to `utf8mb4` – Amon Apr 08 '18 at 05:01
  • More random notes on Python: http://mysql.rjweb.org/doc.php/charcoll#python Please provide `SHOW CREATE TABLE` and `SHOW VARIABLES LIKE 'char%';` – Rick James Apr 09 '18 at 01:39

0 Answers0