5

I have to load around 15 million lines of data in a DB2 database - using java.

To keep things simple, I am not going to talk about the format of the file, or the number of tables in the database. Let's say, it is only one table with 5 columns that is getting data from a huge single csv file.

In this scenario - I need something

  1. Fast
  2. Stable

I am not really worried about

  1. Code reuse
  2. Modularity
  3. Testability

Looks to me like, a batch insert using simple java is the way to go. I am aware of Spring Batch but do not know if it is worth the effort to add that layer.

Question to group - what would you suggest. Anything in Java / Java EE world is an acceptable solution (as long as it does not require software that we need to buy).

Arjan Tijms
  • 37,782
  • 12
  • 108
  • 140
partha
  • 2,286
  • 5
  • 27
  • 37
  • 2
    DB2 has a perfectly good bulk-loading utility, why not use that? It is faster than anything you can do with Java. – Nathan Hughes Mar 18 '13 at 12:55
  • Why do you need to use java? RDBMSs usually come with an import tool that can handle this easily. Also most DB frontends (including free ones) come with an import function. – creinig Mar 18 '13 at 12:57
  • @NathanHughes - Good point. But I have purposefully not gotten into the details of file format and tables etc. In this case there are some minor but important java based modules that need to come in and massage the data that needs to go into the database. And hence the vanilla - take data source as this - and take data dump as that table - and hit execute - sort of thing, would not work here. – partha Mar 18 '13 at 12:58
  • Write a Java program to massage the data and write it to a flat file in DB2 import format. Use the DB2 utility to load the database. – Gilbert Le Blanc Mar 18 '13 at 13:00
  • That sounds promising @GilbertLeBlanc. I need to make this entire thing happen in one go without manual intervention. The trigger is for the file to be dropped in the folder. Do you think, the "DB2 bulk-loading utility" could be triggered from java? – partha Mar 18 '13 at 13:04
  • Ok, then is the problem the CSV parsing or writing it to the DB? And if it's the DB, is plain JDBC out of the question? Since you're developing in java anyway, that'd probably be the simplest solution. (Edit: It might not be enterprisey enough, of course ;) ) – creinig Mar 18 '13 at 13:05
  • I've never tried that. When I've bulk loaded into DB2, I've done it on the mainframe, which allows me to have a batch job with multiple steps. You probably want to look at the output of your Java application before you start the DB2 import. – Gilbert Le Blanc Mar 18 '13 at 13:09
  • CSV parsing not a problem @creinig. Insert to DB2 - given that we are looking at 15 million records is the problem. As I said, I guess I don't really need a enterpricey solution. I am good with anything that works, and works fast, and is stable. Plain old JDBC batch is my current stand. I guess I am trying to crowd review that stand. :) – partha Mar 18 '13 at 13:10
  • @partha: I'd say plain old JDBC should be perfectly fine. Just remember to commit your transaction from time to time :) – creinig Mar 18 '13 at 13:11
  • 1
    Looks to me as if you'll waste more time here on SO discussing your options than any time you can possible save by optimising your solution. If it's a one off, get on with it. – High Performance Mark Mar 18 '13 at 13:14
  • http://pic.dhe.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=%2Fcom.ibm.db2.luw.admin.dm.doc%2Fdoc%2Ft0004590.html – Romain Hippeau Mar 18 '13 at 13:15
  • @partha if you want to execute db2 utility you can do it using runtime class - here is an example http://stackoverflow.com/a/931663/1060656 i have done something similar in Microsoft environment. i download CSV file and then use SQL server utility called by .net code to upload data and log the output of SQL utility. – aked Mar 18 '13 at 13:21

2 Answers2

0

Using an ETL like Pentaho or Talend would be great.

Laabidi Raissi
  • 3,263
  • 1
  • 22
  • 28
  • Ahhh.. the joys of working in an enterprise. You can't get any software like either of these into the enterprise without getting into a 6 month long approval(s) chase. Also, once you get them, getting them into the project is another ordeal. I am sorry, but Java and related free libraries are the only choice at the moment. – partha Mar 18 '13 at 13:02
0

Use the load facility to get it into a new table you create.
Look at the log files to make sure there are no load errors.
Use either JDBC or a stored proc to manipulate the data.
Backup your destination table.
Use a simple insert statement to add the data to the destination table.

Romain Hippeau
  • 24,113
  • 5
  • 60
  • 79