0

I am using Python 3.6 to iterate through a folder structure and return the file paths of all these CSVs I want to import into two already created Oracle tables.

con = cx_Oracle.connect('BLAH/BLAH@XXX:666/BLAH')

#Targets the exact filepaths of the CSVs we want to import into the Oracle database
if os.access(base_cust_path, os.W_OK):
    for path, dirs, files in os.walk(base_cust_path):
        if "Daily" not in path and "Daily" not in dirs and "Jul" not in path and "2017-07" not in path:
            for f in files:
                if "OUTPUT" in f and "MERGE" not in f and "DD" not in f:
                    print("Import to OUTPUT table: "+ path + "/" + f)
                    #Run function to import to SQL Table 1
                if "MERGE" in f and "OUTPUT" not in f and "DD" not in f:
                    print("Import to MERGE table:  "+ path + "/" + f)
                    #Run function to import to SQL Table 2

A while ago I was able to use PHP to produce a function that used the BULK INSERT SQL command for SQL Server:

function bulkInserttoDB($csvPath){
    $tablename = "[DATABASE].[dbo].[TABLE]";
    $insert = "BULK
                INSERT ".$tablename."
                FROM '".$csvPath."'
                WITH (FIELDTERMINATOR = ',', ROWTERMINATOR = '\\n')";

    print_r($insert);
    print_r("<br>");

    $result = odbc_prepare($GLOBALS['connection'], $insert);
    odbc_execute($result)or die(odbc_error($connection));
}

I was looking to replicate this for Python, but a few Google searches left me to believe there is no 'BULK INSERT' command for Oracle. This BULK INSERT command had awesome performance.

Since these CSVs I am loading are huge (2GB x 365), performance is crucial. What is the most efficient way of doing this?

dandev91
  • 1,691
  • 3
  • 22
  • 34

1 Answers1

0

The bulk insert is made using the cx_oracle library and the commands

con = cx_Oracle.connect(CONNECTION_STRING)
cur= con.cursor()
cur.prepare("INSERT INTO MyTable values (
                    to_date(:1,'YYYY/MM/DD HH24:MI:SS'), 
                    :2,
                    :3,
                    to_date(:4,'YYYY/MM/DD HH24:MI:SS'), 
                    :5,
                    :6,
                    to_date(:7,'YYYY/MM/DD HH24:MI:SS'), 
                    :8,
                    to_date(:9,'YYYY/MM/DD HH24:MI:SS'))"
            ) ##prepare your statment
list.append((sline[0],sline[1],sline[2],sline[3],sline[4],sline[5],sline[6],sline[7],sline[8])) ##prepare your data
cur.executemany(None, list) ##insert

you prepare an insert statement. Then you store your file and your list. finally you execute the many. It will paralyze everything.

Steven
  • 14,048
  • 6
  • 38
  • 73
  • This is more of what I was hoping to find. Will try this now. Thanks for the 'it will paralyze everything' heads up - will try on a few CSVs first before committing to it. – dandev91 Aug 10 '17 at 00:54
  • But honestly, I think performances are better using Oracle tools like sqlLoader .... – Steven Aug 10 '17 at 00:57
  • With executemany(), look at the cx_Oracle batcherrors feature to help diagnose invalid data issues. – Christopher Jones Aug 10 '17 at 08:30