35

I have a pandas dataframe that is dynamically created with columns names that vary. I'm trying to push them to sql, but don't want them to go to mssqlserver as the default datatype "text" (can anyone explain why this is the default? Wouldn't it make sense to use a more common datatype?)

Does anyone know how I can specify a datatype for all columns?

column_errors.to_sql('load_errors',push_conn, if_exists = 'append', index = False, dtype = #Data type for all columns#)

the dtype argument takes a dict, and since I don't know what the columns will be it is hard to set them all to be 'sqlalchemy.types.NVARCHAR'

This is what I would like to do:

column_errors.to_sql('load_errors',push_conn, if_exists = 'append', index = False, dtype = 'sqlalchemy.types.NVARCHAR')

Any help/understanding of how best to specify all column types would be much appreciated!

flyingmeatball
  • 7,457
  • 7
  • 44
  • 62
  • 2
    For those who are googling it now, I want to mention that this desirable syntax was implemented – Liza Mar 09 '17 at 20:49
  • @Liza, can you point out how it was implemented and how to use it? – dingx Aug 21 '19 at 03:12
  • https://github.com/pandas-dev/pandas/blob/0bde5690b0f98e3bbc728f186eaa55323d172a0b/pandas/io/sql.py#L481 "dtype : single SQLtype or dict of column name to SQL type, default None Optional specifying the datatype for columns. The SQL type should be a SQLAlchemy type, or a string for sqlite3 fallback connection. If all columns are of the same type, one single value can be used." found in source code. If I am correct, it can be used exactly as the question demonstrated: dtype = 'sqlalchemy.types.NVARCHAR' – Liza Aug 22 '19 at 15:49

3 Answers3

68

To use dtype, pass a dictionary keyed to each data frame column with corresponding sqlalchemy types. Change keys to actual data frame column names:

import sqlalchemy
import pandas as pd
...

column_errors.to_sql('load_errors',push_conn, 
                      if_exists = 'append', 
                      index = False, 
                      dtype={'datefld': sqlalchemy.DateTime(), 
                             'intfld':  sqlalchemy.types.INTEGER(),
                             'strfld': sqlalchemy.types.NVARCHAR(length=255)
                             'floatfld': sqlalchemy.types.Float(precision=3, asdecimal=True)
                             'booleanfld': sqlalchemy.types.Boolean})

You may even be able to dynamically create this dtype dictionary given you do not know column names or types beforehand:

def sqlcol(dfparam):    
    
    dtypedict = {}
    for i,j in zip(dfparam.columns, dfparam.dtypes):
        if "object" in str(j):
            dtypedict.update({i: sqlalchemy.types.NVARCHAR(length=255)})
                                 
        if "datetime" in str(j):
            dtypedict.update({i: sqlalchemy.types.DateTime()})

        if "float" in str(j):
            dtypedict.update({i: sqlalchemy.types.Float(precision=3, asdecimal=True)})

        if "int" in str(j):
            dtypedict.update({i: sqlalchemy.types.INT()})

    return dtypedict

outputdict = sqlcol(df)    
column_errors.to_sql('load_errors', 
                     push_conn, 
                     if_exists = 'append', 
                     index = False, 
                     dtype = outputdict)
Gabe
  • 5,113
  • 11
  • 55
  • 88
Parfait
  • 104,375
  • 17
  • 94
  • 125
46

You can create this dict dynamically if you do not know the column names in advance:

from sqlalchemy.types import NVARCHAR
df.to_sql(...., dtype={col_name: NVARCHAR for col_name in df})

Note that you have to pass the sqlalchemy type object itself (or an instance to specify parameters like NVARCHAR(length=10)) and not a string as in your example.

joris
  • 133,120
  • 36
  • 247
  • 202
  • 3
    Thanks, nice workaround - I should have thought of that! Wish there were a native function to do that - would be nice if the dtype argument recognized a passed type as the dtype for all columns, whereas a dict was for the specific columns mentioned. – flyingmeatball Dec 21 '15 at 03:57
  • 1
    @flyingmeatball That would be to difficult to add I think, I opened an enhancement request: https://github.com/pydata/pandas/issues/11886 – joris Dec 22 '15 at 15:18
  • @joris why `NVARCHAR` instead of `VARCHAR` here? – David542 Oct 23 '20 at 22:05
  • 1
    @David542 that was the original question, see eg here for an explanation for the difference between both: https://stackoverflow.com/questions/144283/what-is-the-difference-between-varchar-and-nvarchar – joris Oct 24 '20 at 07:20
0

Python has a pretty versatile collections library. The defaultdict class allows us to dynamically specify--via a lambda expression--what value should be returned when keys are missing.

Putting this to use for your example:

from sqlalchemy.types import NVARCHAR
from collections import defaultdict
always_nvarchar = defaultdict(lambda: NVARCHAR(length=255))
# ...
column_errors.to_sql('load_errors',push_conn, if_exists = 'append', index = False, dtype = always_nvarchar)
Eric Kramer
  • 192
  • 2
  • 7