Convert dataframe string column to unicode for writing into SQL Server database nvarchar column

Question

I have non-english characters in one of the dataframe columns, pure strings previously retrieved from someone else's SQL Server database. I want to insert those column into my SQL Server table which has column NVARCHAR(MAX). Every time I tried I got question marks there.

I thought solution could be converting dataframe string column to unicode string but not really understood how to do that.

Currently I don't do any modifications on string column and tried to put into SQL right away

data = ncon.list_tuples(value, value.columns.tolist())
query = "INSERT INTO ipm03_6010_85.dbo.Sales_SS VALUES{}".format(data)

to avoid selecting data in python also tried to make join directly on SQL Server but failed with it

SELECT DISTINCT product_id, code, sum(quantity) FROM dbo.Sales_SS GROUP BY product_id, code ORDER BY product_id ASC 
LEFT JOIN (SELECT id, name FROM book.Products) as prods ON prods.id = product_id

also tried

df['name'] = df['name'].astype(str).unicode_string.encode(encoding='UTF-8',errors='strict') couldnt make it work

I want question marks to be gone and everything was stored correctly

SQL Server uses UTF16, not UTF8. It doesn't need any conversions either, the driver will take care of that. Use parameterized queries and *don't* try to convert anything in code. — Panagiotis Kanavos, May 16 '19 at 15:05
In almost all languages, question marks appear when you *convert ASCII* text using the wrong codepage. Any characters that can't be represented in that codepage will be replaced with question marks. — Panagiotis Kanavos, May 16 '19 at 15:07
I have 650k rows in dataframe and currently try to insert with 1000 rows per insert. can you suggest how to make parameterized query instead of selecting 1000 row from df, converting to list and then to tuple — , May 16 '19 at 15:26
Why load them into a dataframe in the first place? SQL Server has bulk insert tools and commands, eg `bcp` or the `BULK INSERT` command that can import data quickly from flat files or, since SQL Server 2017, full CSV files. That's shown [in this SO question](https://stackoverflow.com/questions/29638136/how-to-speed-up-bulk-insert-to-ms-sql-server-from-csv-using-pyodbc) — Panagiotis Kanavos, May 16 '19 at 15:30
You haven's posted *how* you execute the queries either. If you use `pyodbc` [this SO question](https://stackoverflow.com/questions/37008848/basic-pyodbc-bulk-insert) has answers that show how to use `executemany` to insert batches. That's still not BULK INSERT as the comments explain, just batched INSERT statements — Panagiotis Kanavos, May 16 '19 at 15:32
Notice the `?` characters. Those represent positional parameters. `pyodbc` will send the queries along with the parameters to the server. It won't *replace* them into the string the way `format` does, which means there are no string, date or numeric conversion issues. No chance of SQL Injection either — Panagiotis Kanavos, May 16 '19 at 15:35
SQL Server 2017 can run Python internally too, which means you can just "return" the dataframe and have SQL insert the data into a table. Check [Data Interpolation and Transformation using Python in SQL Server 2017](https://www.sqlshack.com/data-interpolation-and-transformation-using-python-in-sql-server-2017/) for an example. The Python script, with dataframes and all, is used as the source for an `INSERT INTO ` query. — Panagiotis Kanavos, May 16 '19 at 15:38
would never think that could make different with parameters but now makes sense. Thanks again — , May 16 '19 at 15:45

score 0 · Accepted Answer · edited May 21 '19 at 04:06

Panagiotis Kanavos can't thank you enough it was a bit too frustrating until you gave correct direction and the line below solved literally everything and it seems much faster as well.

query = "INSERT INTO ipm03_6010_85.dbo.Sales_SS (general_id, date, product_id, quantity, price, code, name, description) VALUES (?, ?, ?, ?, ?, ?, ?, ?)"
cursor.executemany(query, value.values.tolist())

Convert dataframe string column to unicode for writing into SQL Server database nvarchar column

1 Answers1