1
import cx_Oracle
import pandas as pd
from sqlalchemy import create_engine

# credentials
username = "user"
password = "password"
connectStr = "ip:port/service_name"

df = pd.read_csv("data.csv")

# connection
dsn = cx_Oracle.makedsn('my_ip',service_name='my_service_name')

engine = create_engine('oracle+cx_oracle://%s:%s@%s' % (username, 
password, dsn))

# upload dataframe to ORCLDB
df.to_sql(name="test",con=engine, if_exists='append', index=False)

How can I speed up the .to_sql function in Pandas? It's taking me 20mins to write a 120kb file with 1,000 rows as a dataframe into the DB. The column types are all VARCHAR2(256).

Database columns: https://i.stack.imgur.com/17k8B.jpg

David Pham
  • 187
  • 1
  • 4
  • 14
  • In pure cx_Oracle you would use `executemany()`, see https://blogs.oracle.com/opal/efficient-and-scalable-batch-statement-execution-in-python-cx_oracle – Christopher Jones Dec 23 '18 at 05:10

1 Answers1

0

What is happening here is that for every row you insert, it has to wait for the transaction to be completed before the next one can start. The work around here is to do a "bulk insert" using a CSV file that is loaded into memory. I know how this is done using postgres (what I am using) but for oracle, I am not sure. Here is the code I am using for postgres, perhaps it will be of some help.

def bulk_insert_sql_replace(engine, df, table, if_exists='replace', sep='\t', encoding='utf8'):

    # Create Table
    df[:0].to_sql(table, engine, if_exists=if_exists, index=False)
    print(df)

    # Prepare data
    output = io.StringIO()
    df.to_csv(output, sep=sep, index=False, header=False, encoding=encoding)
    output.seek(0)

    # Insert data
    connection = engine.raw_connection()
    cursor = connection.cursor()
    cursor.copy_from(output, table, sep=sep, null='')
    connection.commit()
    cursor.close()

Here is a link to another thread that has tons of great information regarding this issue: Bulk Insert A Pandas DataFrame Using SQLAlchemy

Allen P.
  • 23
  • 1
  • 3