1

This is my first time posting on stack overflow, so bear with me. I have been scouring the internet for an entire day and I have not been able to fix this problem.

Basically, I have a Pandas DataFrame with unicode characters in the column names, and I am getting a UnicodeEncodeError when I try to use to_sql to store the data in a database. I can reproduce the error with the following code:

import pandas as pd
from sqlalchemy import create_engine

df = pd.DataFrame([[1,2],[3,4]], columns = [u'\xe9',u'b'])
engine = create_engine('mysql://root:pass@localhost/testdb')
df.to_sql('data', engine, if_exists = 'replace', index = False)

The last line of the traceback looks like this:

C:\Users\isaac_000\Anaconda\lib\site-packages\pandas\io\sql.pyc in _get_column_names_and_types(self, dtype_mapper)
857              dtype_mapper(self.frame.iloc[:, i]),
858              False)
--> 859             for i in range(len(self.frame.columns))
860             ]
861 

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)

I would appreciate any help you guys can provide. Let me know if you need any more information. Thanks!

Isaac
  • 265
  • 3
  • 10
  • Wow, thanks joris, that seems to have fixed the problem! – Isaac Oct 26 '15 at 21:12
  • Hex `e9` is not Unicode (or UTF-8). In `latin1` it is `é`; perhaps that is what you wanted? But failed to establish that the client was encoded in latin1? – Rick James Feb 07 '19 at 02:24

1 Answers1

0

This is a bug in the current to_sql method, and I filed it here: https://github.com/pydata/pandas/issues/11431 (and will probably be fixed in version 0.17.1)

As a workaround, I would suggest to

  • remove the special unicode character for now in the column names
  • or patch your pandas installation (the fix is very simple and small, see here what you should have to change: https://github.com/pydata/pandas/pull/11432)
joris
  • 133,120
  • 36
  • 247
  • 202