Improve performance on loading CSV into Oracle using Python

Question

Im trying to load csv file of size 1GB , the file has about 1 Million records , I got below code working ,however performance has been horrible with load time for million records taking about 2 hours, is there a way I can improve performance ?

import cx_Oracle
import csv
from datetime import datetime

start=datetime.now()
connection1=cx_Oracle.connect("username", "password","oradb")
cur1 = connection1.cursor()

with open("test_file.csv", "r") as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    batch_size=3000
    sql1="""insert into TEST_CSV_FILE ( START_TS, END_TS,FLDR, TSK, INST, 
    SRC_ERR,tgt_ERR) 
    values (:1, :2, :3, :4, :5, :6, :7)"""
    data = []
    for line in csv_reader:
      data.append((line[0],line[1],line[2],line[3],line[4],line[5],line[6]))
      if len(data) % batch_size == 0:
        cur1.executemany(sql1, data)
        data = []
      if data:
        cur1.executemany(sql1, data)
        connection1.commit()
        cur1.close()
connection1.close()
print(datetime.now()-start)

Do you have to do it in Python? There are ways to load directly from a CSV into Oracle, see https://stackoverflow.com/questions/6198863/oracle-import-csv-file — Barmar, Dec 11 '19 at 06:03
Is using python to do this mandatory or any other method will do (just performance is alone enough? ) ? — Nizam - Madurai, Dec 11 '19 at 06:04
Well, if you really have to use Python, you can try to add the APPEND_VALUES hint to you insert (i.e. INSERT /*+ APPEND_VALUES */ ...) and give it a try. Did you try to see where you're loosing time (e.g. network, DB side, python taking too much time to read the file, ....) — gsalem, Dec 11 '19 at 08:24
@Barmar Thanks for pointing SQL Loader, right now we are trying to use Python as there is ETL code already using Python. — okkadu, Dec 11 '19 at 13:41
@gsalem I'm loosing time in building list from what I see, the INSERT itself is doing fine on the Oracle, but I will give a try with APPEND_VALUES. — okkadu, Dec 11 '19 at 13:42
If loading the file is your pblm, then have a look [here](https://github.com/draftcode/fastcsv) — gsalem, Dec 11 '19 at 15:47

Improve performance on loading CSV into Oracle using Python

0 Answers0