1

I am trying to export a table from pandas to a Microsoft SQL Server Express database.

Pandas reads a CSV file encodes as utf8. If I do df.head(), I can see that pandas shows the foreign characters correctly (they're Greek letters)

However, after exporting to SQL, those characters appear as combinations of question marks and zeros.

What am I doing wrong?

I can't find that to_sql() has any option to set the encoding. I guess I must change the syntax when setting up the SQL engine, but how exactly?

This is what I have been trying:

import numpy as np
import pandas as pd
from sqlalchemy import create_engine, MetaData, Table, select
import sqlalchemy as sqlalchemy


ServerName = my_server_name
Database = my_database
params = '?driver=SQL+Server+Native+Client+11.0'
engine = create_engine('mssql+pyodbc://' + ServerName + '/'+ Database + params, encoding ='utf_8', fast_executemany=True )
connection = engine.raw_connection()
cursor = connection.cursor()

file_name = my_file_name

df = pd.read_csv(file_name, encoding='utf_8', na_values=['null','N/A','n/a', ' ','-']  , dtype = field_map, thousands =',' )

print(df[['City','Municipality']].head())  # This works
Pythonista anonymous
  • 8,140
  • 20
  • 70
  • 112
  • 2
    SQL Server doesn't support UTF-8 till SQL Server 2019. That being said, if you're just performing an `INSERT` then that won't matter for what you're doing. The fact that you say that the characters appear as a `?` suggests that you're not using an `nvarchar` in SQL Server, or not using a `varchar` with the correct collation for the Greek Characters. – Thom A Jan 20 '20 at 17:11

2 Answers2

4

Combining Lamu's comments and these answers:

pandas to_sql all columns as nvarchar

write unicode data to mssql with python?

I have come up with the code below, which works. Basically, when running to_sql, I export all the object columns as NVARCHAR. This is fine in my specific example, because all the dates are datetime and not object, but could be messy in those cases where dates are stored as object.

Any suggestions on how to handle those cases, too?

from sqlalchemy.types import NVARCHAR
txt_cols = df.select_dtypes(include = ['object']).columns
df.to_sql(output_table, engine, schema='dbo', if_exists='replace', index=False, dtype = {col_name: NVARCHAR for col_name in txt_cols}

PS Note I don't see this answer as a duplicate of the others; there are some differences, like the use of df.select.dtypes

Pythonista anonymous
  • 8,140
  • 20
  • 70
  • 112
0

In df.to_sql specify type for this columns. Use this dtype= {'column_name1': sqlalchemy.NVARCHAR(length=50), 'column_name2': sqlalchemy.types.NVARCHAR(length=70)}

MR.Max
  • 36
  • 4