ETL to load large excel into Oracle DB

Question

My requirement is to load a large excel file[with more than 160k rows of data and around 150 columns] placed on a file server into an oracle DB table. It has a couple of constraints though -

The position of two columns[ lets say 'EmpID' & 'AcctNum'] might vary on the excel - like at times 'EmpID' can occur at column 'A' but can also occur at column 'E at other times. Thus the data mapping has to be dynamic.
The file can come at any time of the day and should be processed within an hour of its creation in the file server and multiple files can come on a day. Thus I have to create some Batch in oracle [like running dtsx through SQL server jobs].

OS where the batch is to be run is Unix.

Performance has to be a key challenge here, so kindly take it as a criteria to be performance effective.

Please advise on how we can do this[preferably using freeware/open-source tools]

Thanks & regards, Arka

First you need to normalize that Excel file to a csv format with the right values for each row....then you can follow this steps in order to do a bulk insert into your database http://stackoverflow.com/questions/22241428/oracle-bulk-import — Hackerman, Oct 20 '16 at 16:27
I would suggest using [Apache POI](http://poi.apache.org/spreadsheet/) to process the Excel file, then implement your business logic to find the corresponding data to the database table columns, and use JDBC to insert data to database. — sanastasiadis, Oct 20 '16 at 16:45
You may also want to look into this related question: http://stackoverflow.com/questions/120001/load-excel-data-sheet-to-oracle-database. — sanastasiadis, Oct 20 '16 at 16:45
Does the entire file have the same layout when it comes or does the layout vary throughout the file? Irrespective of if the file has a homogeneous or heterogeneous layout how can you determine the correct columns from which to locate your data? Are their column heading or other special records either at the top of a homogeneous file or whenever a heterogeneous file changes format? — Sentinel, Oct 20 '16 at 21:50
I think, Pentaho ETL / Data Integration module aka Kettle is an open source product but I am not sure about building the product from source. Its 30 day trial version is a full version and might do the job for you. [See github](https://github.com/pentaho/pentaho-kettle) — Sabir Khan, Oct 24 '16 at 11:43

score 2 · Answer 1 · answered Oct 20 '16 at 16:43

Personally I would write a Java program which could be then called by a Unix shell script. The Java program would be very easy:

open a try block
open a DB Connection in autocommit a false
create e prepared statement
read your file, line by line
for each line,
tokenize its fields
- here you have to be smart in order to handle you dynamic fields, of course there have to be a rule
put the values inside the prepared statement
execute it
reach end of file
commit
close try and open finally block
rollback

An autocommit at true would speed the execution.

The requirement got deferred. Thus never had a chance to do the POC. However, really appreciate your comments. Would work on it whenever I get a chance. thanks! — animark, Feb 08 '17 at 15:56

ETL to load large excel into Oracle DB

1 Answers1