3

I am doing a typical read from a DB, and I'm going to process and write to a file step on a Dataset that has many millions (>10 million) of records.

Is there anything from a Design or Architecture point of view that should be kept in mind?

Also are there any Java-Batch specific coding practices that need to be kept in mind? (apart from the general java best practices)

I am using IBM's implementation of JSR352 on Websphere liberty.

ragingasiancoder
  • 616
  • 6
  • 17
Fazil Hussain
  • 425
  • 3
  • 16
  • For the record: very often, there are not **the** best practices; but well, people's opinions what the best practices are. In that sense, not a really good question. – GhostCat Jul 18 '16 at 11:54
  • I just want to make sure, as i design my java-batch solution that i am not missing out on anything, or any feature, or going about it in a way that is not recommended. I know this isn't a very specific question, but since there is a lack of material on JSR 352 in general, i think this is a good forum to ask this question. – Fazil Hussain Jul 18 '16 at 11:58
  • You can not miss any feature. You can miss common sense :) – Alexander Petrov Jul 18 '16 at 12:16

1 Answers1

3
  1. Dont do repeatable reads. If you do ensure everything is in memory. Think first level cache.
  2. Ensure you don't have N+1 selects.
  3. Fast network access is essential for performance - think 10G network
  4. Introduce paralelism. Paralelise the READ from database, dont parallelize the file access unless you know you have more than 1 discs on the file system that can work in parallel.
  5. Is your data model relational- Yes think hibernate, No think Jdbc template.
  6. Read from the database in big chunks. Allocate enough memory for that.
  7. If you have post processing of the data before you write to file. Do it again in parallel.
  8. If you have a read only operation from the DB. You dont need restorability. An operation is either complete or failed. If you dont need to preserve intermediate state for the job election this will give you additional performance boost.
Alexander Petrov
  • 9,204
  • 31
  • 70
  • Can the use of Hibernate improve performance compared to Plain JDBC? (not taking in consideration the other advantages of using Hibernate) – Fazil Hussain Jul 18 '16 at 12:18
  • Surprisingly yes. If you have a relational model in other words fetching many relations at once. It is not that you can not do it with JDBC as well, but the data duplication will kill all the performance. And if you start implementing an algorithm to eliminate the data duplication. Well.... in that case just use Hibernate with appropriate fetch strategies :) Unless you want to write a new better hibernate of course :) – Alexander Petrov Jul 18 '16 at 12:20