1

I've used SQL Server and Python for several years, and I've used Insert Into and df.iterrows, but I have never tried to push all the contents of a data frame to a SQL Server table. I'm working with some larger data sets now, and I'd like to find an efficient way to move everything in a data frame to a table in SQL Server.

I'm testing this code.

# first I loop through a few files and append everything to a list
# this works fine

# convert the list to a data frame
df_append = DataFrame(df_append)
df_append.shape
type(df_append)

# log into DB
import pyodbc
driver= '{SQL Server Native Client 11.0}'

conn_str = (
    r'DRIVER={SQL Server};'
    r'SERVER=LAPTOP-CEDUMII6;'
    r'DATABASE=TestDB;'
    r'Trusted_Connection=yes;'
)
cnxn = pyodbc.connect(conn_str)

cursor = cnxn.cursor()
cursor.execute('SELECT * FROM FFIEC_CDR_Call_Schedule_RIBII')

for row in cursor:
    print('row = %r' % (row,))

# can log into the DB just fine...
# now I am trying to move the contents of the data frame to the table...

# Here is attempt #1...
df_append.to_sql('FFIEC_CDR_Call_Schedule_RIBII', cnxn, index=False, if_exists='replace')

# Error:
df_append.to_sql('FFIEC_CDR_Call_Schedule_RIBII', cnxn, index=False, if_exists='replace')
Traceback (most recent call last):

  File "C:\Users\ryans\Anaconda3\lib\site-packages\pandas\io\sql.py", line 1681, in execute
    cur.execute(*args, **kwargs)

ProgrammingError: ('42S02', "[42S02] [Microsoft][ODBC SQL Server Driver][SQL Server]Invalid object name 'sqlite_master'. (208) (SQLExecDirectW); [42S02] [Microsoft][ODBC SQL Server Driver][SQL Server]Statement(s) could not be prepared. (8180)")


The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File "<ipython-input-87-2d90babfc8a7>", line 1, in <module>
    df_append.to_sql('FFIEC_CDR_Call_Schedule_RIBII', cnxn, index=False, if_exists='replace')

  File "C:\Users\ryans\Anaconda3\lib\site-packages\pandas\core\generic.py", line 2615, in to_sql
    method=method,

  File "C:\Users\ryans\Anaconda3\lib\site-packages\pandas\io\sql.py", line 598, in to_sql
    method=method,

  File "C:\Users\ryans\Anaconda3\lib\site-packages\pandas\io\sql.py", line 1827, in to_sql
    table.create()

  File "C:\Users\ryans\Anaconda3\lib\site-packages\pandas\io\sql.py", line 721, in create
    if self.exists():

  File "C:\Users\ryans\Anaconda3\lib\site-packages\pandas\io\sql.py", line 708, in exists
    return self.pd_sql.has_table(self.name, self.schema)

  File "C:\Users\ryans\Anaconda3\lib\site-packages\pandas\io\sql.py", line 1838, in has_table
    return len(self.execute(query, [name]).fetchall()) > 0

  File "C:\Users\ryans\Anaconda3\lib\site-packages\pandas\io\sql.py", line 1693, in execute
    raise ex from exc

DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': ('42S02', "[42S02] [Microsoft][ODBC SQL Server Driver][SQL Server]Invalid object name 'sqlite_master'. (208) (SQLExecDirectW); [42S02] [Microsoft][ODBC SQL Server Driver][SQL Server]Statement(s) could not be prepared. (8180)")

# Here is attempt #2...same error...
df_append.to_sql('FFIEC_CDR_Call_Schedule_RIBII', schema='dbo', con = cnxn)

I researched this before posting here, and it looks like it's doable. Something in my code must be off, what might be wrong?

halfer
  • 19,824
  • 17
  • 99
  • 186
ASH
  • 20,759
  • 19
  • 87
  • 200
  • 2
    I'd point you to Pandas [`to_sql`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html) method. It supports generating SQL for any database supported by SQLAlchemy. – gallen Dec 16 '20 at 03:08
  • 1
    I tried this: from sqlalchemy import create_engine engine = create_engine('sqlite://', echo=False) df_append.to_sql('FFIEC_CDR_Call_Schedule_RIBII', con=engine, if_exists='append') The code ran for a few seconds and it finished with no errors, but nothing was appended from the DF to the table in the DB. – ASH Dec 16 '20 at 03:30
  • It seems from [this related question](https://stackoverflow.com/questions/48307008/pandas-to-sql-doesnt-insert-any-data-in-my-table) a schema is needed. Apparently a silently failing bug that prevents data from being entered. Could this be the case with yours as well? **Note**: I haven't used `to_sql` myself, so I'm only providing hopeful/helpful hints instead of answers. – gallen Dec 16 '20 at 03:38
  • 1
    Thanks, but it's still not working. I created the table schema with 'create table...'. I think that part is fine. Nothing is being written to the table. No error shows up, so it's hard to understand what's happening here. I feel like this line is the culprit: sql_engine = create_engine('sqlite:///TestDB', echo=False) – ASH Dec 16 '20 at 03:51
  • I suspect you are right. Instead of using an engine, perhaps try a raw connection. Something like this example `conn = sqlite3.connect('TestDB.db')`? Pulled from [this very succinct example](https://datatofish.com/pandas-dataframe-to-sql/) – gallen Dec 16 '20 at 03:56
  • 1
    Thanks, but I guess it's not pointing to the right database. I tried the sample code to create a table in the DB and the first time I tried it, it ran, but nothing showed up in the database. When I ran it again, it said the table already exists, but even when I refresh the view in my database, the table doesn't exist. It must go to some kind of virtual database. I've never seen anything like that before. I'll have to research sqllite3 more tomorrow. Thanks. – ASH Dec 16 '20 at 04:04

1 Answers1

2

pandas to_sql is certainly what you're looking for. Its documentation says that the con parameter can be a

sqlalchemy.engine.(Engine or Connection) or sqlite3.Connection

and that "Legacy support is provided for sqlite3.Connection objects.". So to_sql looks at what you've passed as con and if it is not a SQLAlchemy Connectable (Engine or Connection) then to_sql assumes that it is a sqlite3.Connection. You passed a pyodbc.Connection, which to_sql misinterprets as a sqlite3.Connection, and the resulting error is

[42S02] [Microsoft][ODBC SQL Server Driver][SQL Server]Invalid object name 'sqlite_master'.

The solution is to create a SQLAlchemy Engine object as described here and then pass that Engine object to to_sql.

p.s. For SQL Server, remember to use fast_executemany=True, e.g.,

engine = create_engine(connection_uri, fast_executemany=True)
df.to_sql(table_name, engine, …)
Gord Thompson
  • 116,920
  • 32
  • 215
  • 418
  • Thanks Gord. I suspect Python doesn't recognize the DB. When I run this: df_append.to_sql('FFIEC_CDR_Call_Schedule_RIBII', con=engine, if_exists='append') I get this error: OperationalError: (sqlite3.OperationalError) table FFIEC_CDR_Call_Schedule_RIBII has no column named index There is definitely a field named index, but I don't know why you would even need that because it's certainly not a field. Anyway, I also tried the following: engine = create_engine(cnxn, fast_executemany=True) df_append.to_sql('FFIEC_CDR_Call_Schedule_RIBII', engine, index=False, if_exists='append') – ASH Dec 16 '20 at 14:55
  • That gives me this error: AttributeError: 'pyodbc.Connection' object has no attribute '_instantiate_plugins' – ASH Dec 16 '20 at 14:59
  • You don't pass a pyodbc.Connection object to `create_engine()`, you pass a SQLAlchemy connection URI (string) and SQLAlchemy creates the DBAPI connection for you. Check the [SQLAlchemy documentation](https://docs.sqlalchemy.org/en/13/dialects/mssql.html#module-sqlalchemy.dialects.mssql.pyodbc) again. – Gord Thompson Dec 16 '20 at 15:16
  • Hint: You can use your existing pyodbc connection string by using a SQLAlchemy [Pass through exact Pyodbc string](https://docs.sqlalchemy.org/en/13/dialects/mssql.html#pass-through-exact-pyodbc-string). – Gord Thompson Dec 16 '20 at 15:20
  • Thanks Gord. My TCP/IP port at 1433 was disabled. That didn't help things. Finally got it to run, with this: engine = "mssql+pyodbc://Server_Name/DB_Name?driver=SQL Server Native Client 11.0?trusted_connection=yes" – ASH Dec 16 '20 at 17:53