1

I'm trying to parse an API response which is in json format to a tabular format and write the same to a table residing on Microsoft SQL server.

Each json api response has 51 columns and there are around 15 million rows of data to be written to the SQL server.

I have tried combinations of pyodbc and sqlalchemy as mentioned in other posts on SO.

Using the traditional SQL "insert into " involves hitting the database millions of times which doesn't seem right.

My current Pandas version - 0.14+, Python version 2.7.9

I get the following error when I try to write a sample data frame to the sql table on Continuum.io's wakari server.

sqlalchemy.exc.ResourceClosedError: This result object does not return rows. It has been closed automatically.

When I run the same locally, the following error is being returned:

pyodbc.ProgrammingError: ('42S02', "[42S02] [Microsoft][SQL Server Native Client 11.0][SQL Server]Invalid object name 'sqlite_master'. (208) (SQLExecDirectW)")

Following is the code:

import sqlalchemy
import pyodbc
import pandas as pd
from pandas.io.sql import frame_query
from sqlalchemy import create_engine

engine = create_engine('mssql+pyodbc://<userid>:<password>@pydsn')
cnxn = engine.raw_connection()

df = pd.DataFrame()
data = pd.DataFrame({"A": range(9),"B":range(9)})
df = df.append(data)
print df
df.to_sql("dbo.xyz_test",cnxn,if_exists = 'replace')
print 'done'

Appreciate any help here.

Community
  • 1
  • 1
j-v
  • 43
  • 2
  • 6
  • You need to pass the engine itself to `to_sql`, and not a connection (support for passing a connection is planned for a future pandas release). This should fix the "Invalid object name 'sqlite_master'." error you get locally. – joris May 25 '15 at 12:23
  • But, if you want to write 15 million rows, using `to_sql` which is based on `sqlalchemy` will probably be too slow (as the drivers for sql server (`pyodbc` in your case) are known to be slow). Maybe better to look at something like this: http://stackoverflow.com/questions/15242757/import-csv-file-into-sql-server – joris May 25 '15 at 12:27
  • @joris I had tried passing the engine to to_sql first, but I got the famous "connection rollback" error. Hence I switched back to passing the raw connection. – j-v May 25 '15 at 16:43
  • The data belongs to 7 years - close to 2 million rows per year; one API response gives me max. of 500 rows, so I'd be exporting 500 rows of data at a time to the database. – j-v May 25 '15 at 16:45
  • Can you show the code/error you get when using the engine. As in any case, using a connection is for now only supported for `sqlite`, hence the "Invalid object name 'sqlite_master'" error. – joris May 25 '15 at 17:58
  • @joris Here's the exact error : AttributeError: 'Engine' object has no attribute 'rollback' – j-v May 27 '15 at 12:44
  • Please provide the full traceback. And are you sure you are using pandas 0.14 or above? Please show the exact version with `pd.__version__` – joris May 27 '15 at 13:25

0 Answers0