1

My requirement is to load a large excel file[with more than 160k rows of data and around 150 columns] placed on a file server into an oracle DB table. It has a couple of constraints though -

  1. The position of two columns[ lets say 'EmpID' & 'AcctNum'] might vary on the excel - like at times 'EmpID' can occur at column 'A' but can also occur at column 'E at other times. Thus the data mapping has to be dynamic.

  2. The file can come at any time of the day and should be processed within an hour of its creation in the file server and multiple files can come on a day. Thus I have to create some Batch in oracle [like running dtsx through SQL server jobs].

OS where the batch is to be run is Unix.

  1. Performance has to be a key challenge here, so kindly take it as a criteria to be performance effective.

Please advise on how we can do this[preferably using freeware/open-source tools]

Thanks & regards, Arka

animark
  • 91
  • 1
  • 2
  • 7
  • First you need to normalize that Excel file to a csv format with the right values for each row....then you can follow this steps in order to do a bulk insert into your database http://stackoverflow.com/questions/22241428/oracle-bulk-import – Hackerman Oct 20 '16 at 16:27
  • I would suggest using [Apache POI](http://poi.apache.org/spreadsheet/) to process the Excel file, then implement your business logic to find the corresponding data to the database table columns, and use JDBC to insert data to database. – sanastasiadis Oct 20 '16 at 16:45
  • You may also want to look into this related question: http://stackoverflow.com/questions/120001/load-excel-data-sheet-to-oracle-database. – sanastasiadis Oct 20 '16 at 16:45
  • Does the entire file have the same layout when it comes or does the layout vary throughout the file? Irrespective of if the file has a homogeneous or heterogeneous layout how can you determine the correct columns from which to locate your data? Are their column heading or other special records either at the top of a homogeneous file or whenever a heterogeneous file changes format? – Sentinel Oct 20 '16 at 21:50
  • The first row is going to be the header. – animark Oct 21 '16 at 10:29
  • I think, Pentaho ETL / Data Integration module aka Kettle is an open source product but I am not sure about building the product from source. Its 30 day trial version is a full version and might do the job for you. [See github](https://github.com/pentaho/pentaho-kettle) – Sabir Khan Oct 24 '16 at 11:43

1 Answers1

2

Personally I would write a Java program which could be then called by a Unix shell script. The Java program would be very easy:

  • open a try block
  • open a DB Connection in autocommit a false
  • create e prepared statement
  • read your file, line by line
  • for each line,
  • tokenize its fields
    • here you have to be smart in order to handle you dynamic fields, of course there have to be a rule
  • put the values inside the prepared statement
  • execute it
  • reach end of file
  • commit
  • close try and open finally block
  • rollback

An autocommit at true would speed the execution.

robermann
  • 1,722
  • 10
  • 19
  • The requirement got deferred. Thus never had a chance to do the POC. However, really appreciate your comments. Would work on it whenever I get a chance. thanks! – animark Feb 08 '17 at 15:56