How does Pandas to_sql determine what dataframe column is placed into what database field?

Question

I'm currently using Pandas to_sql in order to place a large dataframe into an SQL database. I'm using sqlalchemy in order to connect with the database and part of that process is defining the columns of the database tables.

My question is, when I'm running to_sql on a dataframe, how does it know what column from the dataframe goes into what field in the database? Is it looking at column names in the dataframe and looking for the same fields in the database? Is it the order that the variables are in?

Here's some example code to facilitate discussion:

engine = create_engine('sqlite:///store_data.db')
meta = MetaData()

table_pop = Table('xrf_str_geo_ta4_1511', meta, 
    Column('TDLINX',Integer, nullable=True, index=True),
    Column('GEO_ID',Integer, nullable=True),
    Column('PERCINCL', Numeric, nullable=True)
)

meta.create_all(engine)

for df in pd.read_csv(file, chunksize=50000, iterator=True, encoding='utf-8', sep=',')
    df.to_sql('table_name', engine, flavor='sqlite', if_exists='append', index=index)

The dataframe in question has 3 columns TDLINX, GEO_ID, and PERCINCL

your df.to_sql command worked really well EXCEPT for the index=index thing. It kept giving me some kind of index error till I took a guess used index=False and it worked like a champ. I also didn't use the "flavor" parameter at all. Big thanks — Bobby, Jul 18 '20 at 17:55

score 26 · Accepted Answer · answered Jan 13 '16 at 17:34

The answer is indeed what you suggest: it is looking at the column names. So matching columns names is important, the order does not matter.

To be fully correct, pandas will not actually check this. What to_sql does under the hood is executing an insert statement where the data to insert is provided as a dict, and then it is just up to the database driver to handle this.
This also means that pandas will not check the dtypes or the number of columns (e.g. if not all fields of the database are present as columns in the dataframe, these will filled with a default value in the database for these rows).

How does Pandas to_sql determine what dataframe column is placed into what database field?

1 Answers1

Linked