0

Is there a way to load data from a flat file to an oracle table. I am using following python code but the file is too big and the script stops after sometime (due to lost db connection).

from tqdm import tqdm

insert_sty = "insert into MRSTY (CUI,TUI,STN,STY,ATUI,CVF) values (:0,:1,:2,:3,:4,:5)"
records=[]

file_path = "../umls_files/umls-2023AA-metathesaurus-full/2023AA/META/MRSTY.RRF"
num_lines = sum(1 for line in open(file_path))

with open(file_path, 'r') as f:
    for line in tqdm(f, total=num_lines, desc="Processing file"):
        line = line.strip()
        records.append(line.split("|"))

    for sublist in records:  
        if sublist:
            sublist.pop()
    
for i in tqdm(records, desc="Inserting records"):
    try:
        cur.execute(insert_sty,i)
        print ("record inserted")       
    except Exception as e:
        print (i)
        print("Error: ",str(e))
        
conn.commit()
rshar
  • 1,381
  • 10
  • 28
  • 2
    Does this answer your question? [sql oracle - import text file](https://stackoverflow.com/questions/41807947/sql-oracle-import-text-file) Or [LOAD DATA INFILE equivalent in Oracle](https://stackoverflow.com/q/8953110/2778710) Or [Loading data from a text file to a table in oracle](https://stackoverflow.com/q/3750538/2778710) – astentx May 29 '23 at 07:12
  • 2
    Also note that `insert` in a `for` loop has **very poor** performance. Use [`cursor.executemany(stmt, binding)`](https://cx-oracle.readthedocs.io/en/latest/user_guide/batch_statement.html#batchstmnt). In my locally running DB for 100k rows it takes 80 seconds for `for i in lst: cur.execute(stmt, i)` and 0.21 second for batches of 1000 rows – astentx May 29 '23 at 07:39
  • 1
    In addition to the comment from @astentx about using `executemany()` (which was also [mentioned to you yesterday](https://stackoverflow.com/a/76353519/4799035)), the default Thin mode of the latest version of cx_Oracle, (now called [python-oracledb](https://cjones-oracle.medium.com/open-source-python-thin-driver-for-oracle-database-e82aac7ecf5a)) is likely to be faster than cx_Oracle 8. – Christopher Jones May 30 '23 at 02:07

1 Answers1

1

As you use Oracle, I'd suggest SQL*Loader. It is utility which loads data really, really fast.

In a few words, how it works:

  • you have a file to be loaded
  • if you have Oracle database installed on your PC (or installed Oracle Client), you have SQL*Loader utility
  • create a control file (by following instructions described in link I posted)
    • it instructs the utility where to find source data, which database table is the target, are there any other options you want to include, etc.
  • run it
    • it is done by calling the sqlldr executable at the operating system command prompt
    • this is where Python might be involved - just to call the utility
  • wait until it is over; check the log file for more info

Have a look at short overview here.

P.S. Forgot to mention: if you posted table description and provided several rows of sample data, we could've helped with composing the control file and demonstrate how it would've worked.

Littlefoot
  • 131,892
  • 15
  • 35
  • 57