0

I have to insert few records into an OracleDB using cx_Oracle, however, I'm confronted with the below error for some records due to the presence of non-ASCII characters in my data (some symbols, etc.).

I have tried the following methods while reading input excel file,

data=pd.read_excel(os.path.join(DATA_PATH,fname))
data=pd.read_excel(os.path.join(DATA_PATH,fname),encoding='utf-8')
data=pd.read_excel(os.path.join(DATA_PATH,fname),encoding=sys.getfilesystemencoding())

Later, I have tried to clean the data to make it compatible with my database schema.

#Cleaning for numeric, string and Unicode
data._get_numeric_data().fillna(0,inplace=True)
for col in data.select_dtypes(include='object').columns:
   #data[col]=data[col].str.strip()
   data[col]=data[col].values.astype('unicode')
data.fillna('',inplace=True)
data.replace({r'\u':''},inplace=True)

#Generating query statement
insert_columns='"'.join(data.columns.tolist())
insert_columns='("'+'","'.join(data.columns.tolist())+'")'
bind_variables='(:'+ ",:".join([str(x) for x in list(range(1,len(data.columns)+1))])+')'
insert_query="INSERT INTO " + table_map[source]+" "+ insert_columns + ' VALUES '+ bind_variables 

db_data=list(data.iloc[:,:].itertuples(index=False, name=None))

#Converting to native python dtypes
for row in db_data:
    for item in row:
        if type(item)==np.int64:
            item=int(np.int64(item).item())
        if type(item)==np.float64:
            item=float(np.float64(item).item())     

#Inserting data #1
for i in range(len(db_data)+1):
    print(i)
    cursor.executemany(insert_query, db_data[i:i+1])

#Inserting data #2
cursor.executemany(insert_query, db_data[:])

However, in the end, I'm faced with the same Unicode error for some records. In the python object, I have these as non-ASCII, but I'm unable to filter them out.

In [152]: db_data[10:11][-1][-1]
Out[152]: 'PO3005544, PO3005729\xa0'

In [153]: print(db_data[10:11][-1][-1])
PO3005544, PO3005729 

In [154]: data.iloc[10:11,-1]
Out[154]: 
10    PO3005544, PO3005729 
Name: AMENDMENT_DESCRIPTION, dtype: object

In [155]: data.iloc[10:11,-1].values
Out[155]: array(['PO3005544, PO3005729\xa0'], dtype=object)

In [157]: data.iloc[10:11,-1].values[-1][-1]
Out[157]: '\xa0'

In [158]: len(data.iloc[10:11,-1].values[-1][-1])
Out[158]: 1

Traceback (most recent call last):

File "", line 3, in cursor.executemany(insert_query, db_data[i:i+1])

UnicodeEncodeError: 'ascii' codec can't encode character '\xa0' in position 20: ordinal not in range(128)

  • https://stackoverflow.com/questions/33263669/how-to-remove-nonascii-characters-in-python ? – Christopher Jones Jan 29 '20 at 06:17
  • 1
    "_I have tried to clean the data to make it compatible with my database schema._" - Do you have a good reason for choosing/configuring a database that doesn't support Unicode strings? – GordonAitchJay Mar 28 '20 at 09:15

0 Answers0