Export pandas dataframe to Microsoft SQL Server 2014

Question

I'm trying to parse an API response which is in json format to a tabular format and write the same to a table residing on Microsoft SQL server.

Each json api response has 51 columns and there are around 15 million rows of data to be written to the SQL server.

I have tried combinations of pyodbc and sqlalchemy as mentioned in other posts on SO.

Using the traditional SQL "insert into " involves hitting the database millions of times which doesn't seem right.

My current Pandas version - 0.14+, Python version 2.7.9

I get the following error when I try to write a sample data frame to the sql table on Continuum.io's wakari server.

sqlalchemy.exc.ResourceClosedError: This result object does not return rows. It has been closed automatically.

When I run the same locally, the following error is being returned:

pyodbc.ProgrammingError: ('42S02', "[42S02] [Microsoft][SQL Server Native Client 11.0][SQL Server]Invalid object name 'sqlite_master'. (208) (SQLExecDirectW)")

Following is the code:

import sqlalchemy
import pyodbc
import pandas as pd
from pandas.io.sql import frame_query
from sqlalchemy import create_engine

engine = create_engine('mssql+pyodbc://<userid>:<password>@pydsn')
cnxn = engine.raw_connection()

df = pd.DataFrame()
data = pd.DataFrame({"A": range(9),"B":range(9)})
df = df.append(data)
print df
df.to_sql("dbo.xyz_test",cnxn,if_exists = 'replace')
print 'done'

Appreciate any help here.

You need to pass the engine itself to `to_sql`, and not a connection (support for passing a connection is planned for a future pandas release). This should fix the "Invalid object name 'sqlite_master'." error you get locally. — joris, May 25 '15 at 12:23
But, if you want to write 15 million rows, using `to_sql` which is based on `sqlalchemy` will probably be too slow (as the drivers for sql server (`pyodbc` in your case) are known to be slow). Maybe better to look at something like this: http://stackoverflow.com/questions/15242757/import-csv-file-into-sql-server — joris, May 25 '15 at 12:27
@joris I had tried passing the engine to to_sql first, but I got the famous "connection rollback" error. Hence I switched back to passing the raw connection. — j-v, May 25 '15 at 16:43
The data belongs to 7 years - close to 2 million rows per year; one API response gives me max. of 500 rows, so I'd be exporting 500 rows of data at a time to the database. — j-v, May 25 '15 at 16:45
Can you show the code/error you get when using the engine. As in any case, using a connection is for now only supported for `sqlite`, hence the "Invalid object name 'sqlite_master'" error. — joris, May 25 '15 at 17:58
@joris Here's the exact error : AttributeError: 'Engine' object has no attribute 'rollback' — j-v, May 27 '15 at 12:44
Please provide the full traceback. And are you sure you are using pandas 0.14 or above? Please show the exact version with `pd.__version__` — joris, May 27 '15 at 13:25

Export pandas dataframe to Microsoft SQL Server 2014

0 Answers0