1

I have a panda dataframe with text columns ('testdf'). I am using the below code to insert to TEST table in oracle database

from sqlalchemy import create_engine, Unicode, NVARCHAR
engine = create_engine("oracle+cx_oracle://{user}:{pw}@xxxxx.xxxxx.xx:1521/{db}"
                       .format(user="xxx",
                               pw="xxx",
                               db="xx"))
testdf.to_sql("TEST", con = engine, if_exists = 'append')

But it returns an error with encoding as below:

UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f447' in position 237: character maps to <undefined>

How can I solve this problem? I am using Python 3, Jupyter Notebook with Anaconda

1 Answers1

1

This is a common question. I think this answer is a good one. Or this one.

The problem is that Python is trying to convert your data (which is encoded in Unicode) into some other character set to insert into the database, and that other character set doesn't include \U0001f447 (which is in your dataframe). This answer points out that if you look at the full error traceback and not just the error message, it will tell you which charset it's trying to convert into.

There's a few different options. The easiest is probably to pass ?charset=utf8 to cx_oracle in your connect string. This tells cx_oracle to send strings as Unicode.

"oracle+cx_oracle://{user}:{pw}@xxxxx.xxxxx.xx:1521/{db}?charset=utf8"

You could also try setting the NLS_LANG environment variable. This will tell the Oracle server to expect Unicode from your Python application.

os.environ['NLS_LANG']= 'AMERICAN_AMERICA.AL32UTF8'
kfinity
  • 8,581
  • 1
  • 13
  • 20