0

I am using pandas 0.16 and sqlalchemy to export data to a Microsft SQL Server 2014 database. The dataframe to_sql method automatically creates certain constraints on the table, e.g. it creates a constraint that a boolean column must be either 0 or 1.

I suspect these constraints are slowing down the export process. Is there a way to disable them, at least temporarily (i.e. re-enabling them only after all the data is in SQL)?

Also, is this documented anywhere? I couldn't find any mention of this, neither in the pandas docs nor in the sqlalchemy.

ChrisF
  • 134,786
  • 31
  • 255
  • 325
Pythonista anonymous
  • 8,140
  • 20
  • 70
  • 112
  • I have just noticed that to_sql creates two duplicate constraints for each boolean field. This slows everything down. – Pythonista anonymous Apr 20 '15 at 16:26
  • This is also followed up here: https://github.com/pydata/pandas/issues/9955 – joris Apr 20 '15 at 18:01
  • Further, pandas is a community project, so contributions to improve the documentation (or even just opening issues to explain in more detail what is lacking in the docs) are very welcome! – joris Apr 20 '15 at 18:02
  • @Joris, it was ME who reported the bug to github. I registered under a different username, but it was me. You'll notice the github report was submitted shortly after I asked this question here – Pythonista anonymous Apr 21 '15 at 07:21
  • I know it was you, but still it is useful to have this link here for other people looking at this question if the discussion happens there further. And again, I really appreciate that you report such a (probable) bug! Can you provide some more details in the github issue? – joris Apr 21 '15 at 07:40

1 Answers1

0

The reason to_sql writes your boolean data as 0 and 1's, is because SQL Server has no boolean data type (see eg Is there a Boolean data type in Microsoft SQL Server like there is in MySQL?).
In such cases, SQLAlchemy by default adds those constraints to the column, as is documented in the SQLAlchemy docs: http://docs.sqlalchemy.org/en/latest/core/type_basics.html#sqlalchemy.types.Boolean

Using the possibility of overriding the default type with the dtype argument in to_sql (documented here), you can specify to not create this constraint:

from sqlalchemy.types import Boolean
df.to_sql('name', engine, dtype={'my_bool_col': Boolean(create_constraint=False)})
Community
  • 1
  • 1
joris
  • 133,120
  • 36
  • 247
  • 202
  • I see - thanks. So I'd need to specify the option for each boolean column? Also, the to_sql method creates two such constraints for each boolean column. One possibility is that pandas creates one and sql_alchemy creates another, but I'm not sure yet. Either way - it's wrong! For now I'm finding it less fiddly to disable or drop the constraints with a SQL statement. – Pythonista anonymous Apr 21 '15 at 09:55
  • Yes, for now you need to specify it for each column (unless sqlalchemy provides something for that). But you can do this with eg `{col: Boolean(create_constraint=False) for col in df.select_dtypes(['bool']).columns}`. You can always report an issue for such an enhancement if you think of a way to make this easier. – joris Apr 21 '15 at 10:19
  • For the duplicate constraint, please give some more details on the github issue. I already asked there, but it would be useful to show what you find wrong. – joris Apr 21 '15 at 10:20