we need to merge the set of table data from one data source to another based on the last run config date. Had implemented the spring batch and it is working fine but performance is too slow. taken around 18 hours to process around 5 million records. Haven't used any multi threading or partionin yet. Need help in finding the right design approach for increase the performance. previously this task was done through sql loader and it completed in 3 hours. Have around 8 table to be merged into another datasource. please let me know if any info needed. thanks in advance.
Asked
Active
Viewed 115 times
0
-
I guess insight on the db operation would be help.. like what are the DMLs involved, if you are using joins and how it is etc – Karthik Prasad Aug 19 '14 at 13:21
1 Answers
0
Spring Batch is designed to allow for incremental enhancement of your batch jobs from basic single threaded processing to full blown multi-jvm scaled solutions with minimal configuration changes from each step. Without knowing too much about your use case, the approach you take will really depend on your requirements:
- Do you need restartability? If so, then that eliminates basic multi-threaded steps since most readers do not support multi-threaded processing with restartability.
- Is the process IO bound? I'm assuming so which would eliminate remote chunking as an option.
If the above assumptions are correct, that leaves partitioning. You can read more about partitioning vs chunking here: Difference between spring batch remote chunking and remote partitioning.
Once you've chosen partitioning as your model, the only other questions you'll need to answer are:
- What is the partitioning strategy? Partitioning sends descriptions of the data to be processed from the master to each slave. You'll need to determine what that description consists of (in a db, id ranges are a common option).
- Local or remote? Can you get the throughput you need with a single JVM using threads to execute the slaves or do you need more horse power? If so, you'll want to look at remote partitioning.

Community
- 1
- 1

Michael Minella
- 20,843
- 4
- 55
- 67