0

I need to import data from Postgres to Oracle. I'm trying to do this with python using psycopg2, pandas, cx_Oracle. Firstly I'm importing data:

sql = '''select * from test'''
conn = psycopg2.connect(dbname='db',user='usr',password='pswd',host='...')
df = pd.read_sql_query(sql,conn)
conn.close()

Then I'm trying to import it to oracle:

dsn = """(DESCRIPTION=(LOAD_BALANCE=ON)(FAILOVER=ON)(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=...)(PORT=1521))
                                            (ADDRESS=(PROTOCOL=TCP)(HOST=...)(PORT=1521))
                                            (ADDRESS=(PROTOCOL=TCP)(HOST=...)(PORT=1521)))
             (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=test)))"""
usr = "usr"
pswd = "user"

engine = create_engine('oracle+cx_oracle://%s:%s@%s' % (usr, pswd, dsn))
conn = engine.connect()
conn.execute("TRUNCATE TABLE test_or")

df.to_sql(name = 'test_or', con = engine, schema = 'sch', index = False, if_exists='append')

It works fast with 3,000,000 rows tables, but it comes harder and laggy to take data from 70,000,000 rows tables.

Is this possible to speed up the process? I wanted to partition query and iterate it. But I can’t think of how to split query

Christopher Jones
  • 9,449
  • 3
  • 24
  • 48
FeoJun
  • 103
  • 1
  • 14
  • I think that the Python is excessive here. Piping from `psql` with `copy` command to`sqlldr` should do the job. – Abelisto Apr 12 '21 at 19:40
  • You could also try using oracle_fdw to push the data directly from the postgres database. – Jeremy Apr 12 '21 at 20:01
  • @Jeremy I can't install anything or create tables. If I understand correctly how to use fdw, I need to install it first – FeoJun Apr 12 '21 at 20:47
  • @Abelisto Can you please explain how to do this? I never used psql and sqlldr before – FeoJun Apr 12 '21 at 20:52
  • Ok, you need to read the documentations first. Assuming that you are using *nix system it could be something like `PGPASSWORD=pswd psql -d db -U usr -h ... -c "copy test to stdout with (format csv)" | sqlldr ...` (I also never used `sqlldr` but it should be simple according to the documentation) – Abelisto Apr 12 '21 at 21:15

1 Answers1

1

Have you looked at Oracle Database Gateway for ODBC? Also look into Oracle GoldenGate.

Another solution is to dump a file of data on disk and then use SQL*Loader or External Tables to load it, see INSERT of 10 million queries under 10 minutes in Oracle? and this blog.

If you are staying with Python, and since performance is important, then use the native drivers directly instead of Pandas (which adds another layer that data has to go through). Review cx_Oracle doc Batch Statement Execution and Bulk Loading. You can decide on your own partitioning method to do batches of rows at a time. Another example is https://github.com/oracle/python-cx_Oracle/issues/548#issuecomment-816007567

Look at the DB schema: review indexes, "append" hints etc.

Christopher Jones
  • 9,449
  • 3
  • 24
  • 48