3

I am using pandas 0.16 and sqlalchemy. Is it possible to export just the structure, i.e. column names and data types but no rows, of a dataframe to SQL?

The closest I managed to get to was to export the first row only:

df.ix[[0],:].to_sql( tablename, myconnection )

And then I'd have to do a truncate table. However, there are inconsistencies between the to_csv and the to_sql methods: to_csv writes boolean fields as the strings 'TRUE' or 'FALSE' , whereas to_sql writes them as 0 or 1. This means that importing files creates with dataframe.to_csv is more complicated than it should be.

If I run

df.ix[[],:].to_sql( tablename, myconnection )

that doesn't work because all columns are exported as text.

Pythonista anonymous
  • 8,140
  • 20
  • 70
  • 112

3 Answers3

5

You can use the get_schema function:

from pandas.io.sql import get_schema

engine = ...
df = ..
get_schema(df, 'table_name', con=engine)

This will give you the schema that would otherwise be created in string form, which you could execute with engine.execute

Further, the reason to_sql writes your boolean data as 0 and 1's, is because SQL Server has no boolean data type (see eg Is there a Boolean data type in Microsoft SQL Server like there is in MySQL?)

Community
  • 1
  • 1
joris
  • 133,120
  • 36
  • 247
  • 202
  • Thank you. Is get_schema documented anywhere? I couldn't find it in http://pandas.pydata.org/pandas-docs/version/0.16.0/search.html?q=get_schema&check_keywords=yes&area=default – Pythonista anonymous Apr 21 '15 at 07:25
  • You are correct, this is not in the API docs. I opened an issue for this: https://github.com/pydata/pandas/issues/9960 – joris Apr 21 '15 at 07:52
  • But, can you solve your problem with this? Or not yet? Another possibility would be to make a `io.sql.SQLTable` object and get its `table` attribute, which gives an SQLAlchemy `Table` that you can create with `create()` – joris Apr 21 '15 at 07:56
  • It seems I can pass a dictionary to dtype if I want to override pandas and SQL alchemy's default data type mapping. This is useful. One question remains: can I control the constraints that Python creates on the table? E.g. for boolean columns, it adds the constraint that the values must be 0 or 1). I can run a SQL statement to drop or disable the constraints, but I'd like the option to create the table with no constraints at all in the first place – Pythonista anonymous Apr 21 '15 at 08:09
1

.to_sql() supports a dict= argument that lets you specify the column types as SQLAlchemy types.

df.ix[[], :].to_sql(tablename, myconnection, dtype={
    'column1': sqlalchemy.types.Float,
    'column2': sqlalchemy.types.BigInt,
    'column3': sqlalchemy.types.Date,
})

... will let you map the columns to their respective SQLAlchemy types.

S Anand
  • 11,364
  • 2
  • 28
  • 23
  • so, to recap, if I transfer at least one row, pandas/sqlalchemy create the proper data type, otherwise they create all columns as text. The only way to get round the issue is to manually and painfully specify the data type of each column, which is precisely what I wanted to avoid. Pandas' half-baked SQL support gets more and more frustrating by the second! – Pythonista anonymous Apr 20 '15 at 16:09
  • Yes, and I'm afraid the problem here is in [this line](https://github.com/pydata/pandas/blob/8fe1cf6d1931071c378fcc3170b82a145a037da0/pandas/io/sql.py#L820). `data.ix[[0], :].iloc[:, 0].dtype` returns the right dtype, but `data.ix[[], :].iloc[:, 0].dtype` does not. Might be worth raising an issue. – S Anand Apr 20 '15 at 16:32
0

Maybe you can try drop method to drop all rows.

import pandas as pd
from sqlalchemy import create_engine

df = pd.read_csv("c:\\Users\\ym\\Desktop\\out.csv")
# just drop all rows
df = df.drop(df.index[ [x for x in range(0,len(df))]])
engine = create_engine('mysql://root:root@127.0.0.1:3306/test?charset=utf8', echo=False)
df.to_sql(name=table_name, con=engine, if_exists='replace', chunksize=1000 ,index=False)
alen
  • 111
  • 1
  • 3
  • 14