2

I have been trying to export a large pandas dataframe using DataFrame.to_sql to a MySQL database, but the dataframe has unicode characters in some columns, some of which cause warnings during export and are converted to ?.

I managed to reproduce the issue with this example (database login removed):

import pandas as pd
import sqlalchemy
import pymysql

engine = sqlalchemy.create_engine('mysql+pymysql://{}:{}@{}/{}?charset=utf8'.format(*login_info), encoding='utf-8')

df_test = pd.DataFrame([[u'\u010daj',2], \
                       ['čaj',2], \
                       ['špenát',4], \
                       ['květák',7], \
                       ['kuře',1]], \
                       columns = ['a','b'])

df_test.to_sql('test', engine, if_exists = 'replace', index = False, dtype={'a': sqlalchemy.types.UnicodeText()})

The first two rows of the dataframe should be the same, just defined differently.

I get the following warning, and the problematic characters (č, ě, ř) are rendered as ?:

/usr/local/lib/python3.6/site-packages/pymysql/cursors.py:166: Warning: (1366, "Incorrect string value: '\\xC4\\x8Daj' for column 'a' at row 1")
  result = self._query(query)
/usr/local/lib/python3.6/site-packages/pymysql/cursors.py:166: Warning: (1366, "Incorrect string value: '\\xC4\\x8Daj' for column 'a' at row 2")
  result = self._query(query)
/usr/local/lib/python3.6/site-packages/pymysql/cursors.py:166: Warning: (1366, "Incorrect string value: '\\xC4\\x9Bt\\xC3\\xA1k' for column 'a' at row 4")
  result = self._query(query)
/usr/local/lib/python3.6/site-packages/pymysql/cursors.py:166: Warning: (1366, "Incorrect string value: '\\xC5\\x99e' for column 'a' at row 5")
  result = self._query(query)

with the resulting database table test looking like this:

a       b
?aj     2
?aj     2
špenát  4
kv?ták  7
ku?e    1

Curiously, the ž, š and á characters (and others in my full dataset) are processed correctly, so it seems to only affect a subset of unicode characters. As you can see above, I also tried setting utf-8 wherever I could (engine, DataFrame.to_sql) with no effect.

daneeq
  • 61
  • 1
  • 10

2 Answers2

1

pymysql:

import pymysql
con = pymysql.connect(host='127.0.0.1', port=3306,
                  user='root', passwd='******',
                  charset="utf8mb4")

sqlalchemy:

    db_url = sqlalchemy.engine.url.URL(drivername='mysql', host=foo.db_host,
        database=db_schema,
        query={ 'read_default_file' : foo.db_config, 'charset': 'utf8mb4' })

See "Best practice" in http://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored Explanation of ?:

  • The bytes to be stored are not encoded as utf8/utf8mb4. Fix this.
  • The column in the database is CHARACTER SET utf8 (or utf8mb4). Fix this.
  • Also, check that the connection during reading is UTF-8.

(Note: The CHARACTER SETs utf8 and utf8mb4 are interchangeable for European languages.)

These are Czech characters?

Rick James
  • 135,179
  • 13
  • 127
  • 222
0

I haved met the same problem,use pymysql drive as well.

I change mysql drive to mysql-connector,1366 Warning disappear

install mysql-connector drive

pip install mysql-connector

sqlalchemy engine setting like this

create_engine('mysql+mysqlconnector://root:tj1996@localhost:3306/new?charset=utf8mb4')

JayTam
  • 209
  • 2
  • 11