Since the discontinuation of the pymssql library (which seems to be under development again) we started using the cTDS library developed by the smart people at Zillow and for our surprise it supports the FreeTDS Bulk Insert.
As the name suggests cTDS is written in C on top of FreeTDS library, which makes it fast, really fast. IMHO this is the best way to bulk insert into SQL Server since the ODBC driver does not support bulk insert and executemany
or fast_executemany
as suggested aren't really bulk insert operations. The BCP tool and T-SQL Bulk Insert has it limitations since it needs the file to be accessible by the SQL Server which can be a deal breaker in many scenarios.
Bellow a naive implementation of Bulk Inserting a CSV file. Please, forgive me for any bug, I wrote this from mind without testing.
I don't know why but for my server which uses Latin1_General_CI_AS I needed to wrap the data which goes into NVarChar columns with ctds.SqlVarChar. I opened an issue about this but developers said the naming is correct, so I changed my code to keep me mentally health.
import csv
import ctds
def _to_varchar(txt: str) -> ctds.VARCHAR:
"""
Wraps strings into ctds.NVARCHAR.
"""
if txt == "null":
return None
return ctds.SqlNVarChar(txt)
def _to_nvarchar(txt: str) -> ctds.VARCHAR:
"""
Wraps strings into ctds.VARCHAR.
"""
if txt == "null":
return None
return ctds.SqlVarChar(txt.encode("utf-16le"))
def read(file):
"""
Open CSV File.
Each line is a column:value dict.
https://docs.python.org/3/library/csv.html?highlight=csv#csv.DictReader
"""
with open(file, newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
yield row
def transform(row):
"""
Do transformations to data before loading.
Data specified for bulk insertion into text columns (e.g. VARCHAR,
NVARCHAR, TEXT) is not encoded on the client in any way by FreeTDS.
Because of this behavior it is possible to insert textual data with
an invalid encoding and cause the column data to become corrupted.
To prevent this, it is recommended the caller explicitly wrap the
the object with either ctds.SqlVarChar (for CHAR, VARCHAR or TEXT
columns) or ctds.SqlNVarChar (for NCHAR, NVARCHAR or NTEXT columns).
For non-Unicode columns, the value should be first encoded to
column’s encoding (e.g. latin-1). By default ctds.SqlVarChar will
encode str objects to utf-8, which is likely incorrect for most SQL
Server configurations.
https://zillow.github.io/ctds/bulk_insert.html#text-columns
"""
row["col1"] = _to_datetime(row["col1"])
row["col2"] = _to_int(row["col2"])
row["col3"] = _to_nvarchar(row["col3"])
row["col4"] = _to_varchar(row["col4"])
return row
def load(rows):
stime = time.time()
with ctds.connect(**DBCONFIG) as conn:
with conn.cursor() as curs:
curs.execute("TRUNCATE TABLE MYSCHEMA.MYTABLE")
loaded_lines = conn.bulk_insert("MYSCHEMA.MYTABLE", map(transform, rows))
etime = time.time()
print(loaded_lines, " rows loaded in ", etime - stime)
if __name__ == "__main__":
load(read('data.csv'))