pandas to_sql all columns as nvarchar

Question

I have a pandas dataframe that is dynamically created with columns names that vary. I'm trying to push them to sql, but don't want them to go to mssqlserver as the default datatype "text" (can anyone explain why this is the default? Wouldn't it make sense to use a more common datatype?)

Does anyone know how I can specify a datatype for all columns?

column_errors.to_sql('load_errors',push_conn, if_exists = 'append', index = False, dtype = #Data type for all columns#)

the dtype argument takes a dict, and since I don't know what the columns will be it is hard to set them all to be 'sqlalchemy.types.NVARCHAR'

This is what I would like to do:

column_errors.to_sql('load_errors',push_conn, if_exists = 'append', index = False, dtype = 'sqlalchemy.types.NVARCHAR')

Any help/understanding of how best to specify all column types would be much appreciated!

For those who are googling it now, I want to mention that this desirable syntax was implemented — Liza, Mar 09 '17 at 20:49
@Liza, can you point out how it was implemented and how to use it? — dingx, Aug 21 '19 at 03:12
https://github.com/pandas-dev/pandas/blob/0bde5690b0f98e3bbc728f186eaa55323d172a0b/pandas/io/sql.py#L481 "dtype : single SQLtype or dict of column name to SQL type, default None Optional specifying the datatype for columns. The SQL type should be a SQLAlchemy type, or a string for sqlite3 fallback connection. If all columns are of the same type, one single value can be used." found in source code. If I am correct, it can be used exactly as the question demonstrated: dtype = 'sqlalchemy.types.NVARCHAR' — Liza, Aug 22 '19 at 15:49

score 68 · Answer 1 · edited Feb 11 '21 at 15:44

To use dtype, pass a dictionary keyed to each data frame column with corresponding sqlalchemy types. Change keys to actual data frame column names:

import sqlalchemy
import pandas as pd
...

column_errors.to_sql('load_errors',push_conn, 
                      if_exists = 'append', 
                      index = False, 
                      dtype={'datefld': sqlalchemy.DateTime(), 
                             'intfld':  sqlalchemy.types.INTEGER(),
                             'strfld': sqlalchemy.types.NVARCHAR(length=255)
                             'floatfld': sqlalchemy.types.Float(precision=3, asdecimal=True)
                             'booleanfld': sqlalchemy.types.Boolean})

You may even be able to dynamically create this dtype dictionary given you do not know column names or types beforehand:

def sqlcol(dfparam):    
    
    dtypedict = {}
    for i,j in zip(dfparam.columns, dfparam.dtypes):
        if "object" in str(j):
            dtypedict.update({i: sqlalchemy.types.NVARCHAR(length=255)})
                                 
        if "datetime" in str(j):
            dtypedict.update({i: sqlalchemy.types.DateTime()})

        if "float" in str(j):
            dtypedict.update({i: sqlalchemy.types.Float(precision=3, asdecimal=True)})

        if "int" in str(j):
            dtypedict.update({i: sqlalchemy.types.INT()})

    return dtypedict

outputdict = sqlcol(df)    
column_errors.to_sql('load_errors', 
                     push_conn, 
                     if_exists = 'append', 
                     index = False, 
                     dtype = outputdict)

Thanks, this works but I was hoping there was a simpler way to do it in the single to_sql command I was already using. — flyingmeatball, Dec 21 '15 at 03:55
very helpful might be worth edit on NVARCHAR type to VARCHAR or something that works in postgres SQL? — mapping dom, May 18 '21 at 18:14

joris · Accepted Answer · 2015-12-20T19:15:30.790

46

You can create this dict dynamically if you do not know the column names in advance:

from sqlalchemy.types import NVARCHAR
df.to_sql(...., dtype={col_name: NVARCHAR for col_name in df})

Note that you have to pass the sqlalchemy type object itself (or an instance to specify parameters like NVARCHAR(length=10)) and not a string as in your example.

edited Dec 20 '15 at 19:15

answered Dec 20 '15 at 18:49

joris

133,120
36
247
202

3

Thanks, nice workaround - I should have thought of that! Wish there were a native function to do that - would be nice if the dtype argument recognized a passed type as the dtype for all columns, whereas a dict was for the specific columns mentioned. – flyingmeatball Dec 21 '15 at 03:57
1

@flyingmeatball That would be to difficult to add I think, I opened an enhancement request: https://github.com/pydata/pandas/issues/11886 – joris Dec 22 '15 at 15:18
@joris why `NVARCHAR` instead of `VARCHAR` here? – David542 Oct 23 '20 at 22:05
1

@David542 that was the original question, see eg here for an explanation for the difference between both: https://stackoverflow.com/questions/144283/what-is-the-difference-between-varchar-and-nvarchar – joris Oct 24 '20 at 07:20

score 0 · Answer 3 · answered Mar 15 '23 at 17:46

Python has a pretty versatile collections library. The defaultdict class allows us to dynamically specify--via a lambda expression--what value should be returned when keys are missing.

Putting this to use for your example:

from sqlalchemy.types import NVARCHAR
from collections import defaultdict
always_nvarchar = defaultdict(lambda: NVARCHAR(length=255))
# ...
column_errors.to_sql('load_errors',push_conn, if_exists = 'append', index = False, dtype = always_nvarchar)

pandas to_sql all columns as nvarchar

3 Answers3

Linked