0

We have a requirement to read a CSV file and for each line read, there shall be a REST call. The REST call returns an array of elements, for each element returned there shall be an update in DB.

There is an NFR to achieve parallel processing in the above requirement.,

After reading the CSV, each individual line processing has to be parallel i.e., there shall be a group of workers making concurrent REST calls per each line read.

Subsequently, for each array element found in the response of the REST call, there shall be parallel updates to the DB as well.

Any valuable suggestions / ideas on how to achieve this requirement in Spring Batch?

We have thought of 2 approaches, the first one is to go with Master / Worker design for doing REST calls on each CSV line that is read.

Each worker here will correspond to one line read from the CSV file, they shall perform the REST call and when the response is returned, each of these workers shall become a master themselves and launch another set of workers based on the number of array elements returned from the REST call.

Each worker then launched shall handle one element of the response returned above and will perform DB updates in parallel.

Is this even possible and a good solution?

The second one is to use JobStep based approach i.e., to launch a Job from another. If we have to follow this approach, how do we communicate the data between 2 Jobs? i.e., Suppose our first job (Job1) reads the CSV and makes a REST call, the second Job (Job2) shall be responsible for processing the individual response elements from the REST call. In that case, how do we communicate the response element data from the Job1 to Job2. In this scenario, can we have Job1 launching multiple Job2 instances for parallel DB updates?

The solutions we have thought may not be very clear and confusing, we are not sure if these solutions are even correct and feasible.

Apologies for any confusions caused. But we are clueless on how to come up with the design for this requirement.

In either case, we are not clear on how the failures will be tracked and how the job can be re-run from the failed state.

Any help is much appreciated!!

Sabari
  • 127
  • 1
  • 13
  • For stream processing, you should definitely try java streams. They are available in Java 8 and works efficiently for stream multithreading. – vizsatiz Nov 03 '18 at 14:34
  • I would like to help, but that's too broad. Please narrow your question (See https://stackoverflow.com/help/mcve). – Mahmoud Ben Hassine Nov 05 '18 at 09:28
  • Thank you for offering help and Sorry for not getting this clear. – Sabari Nov 05 '18 at 17:04
  • We have Student Data in a CSV file, each line corresponds to a Student like below., Student1 Student2 . StudentN. For each Student, there needs to be a REST call to gather the student details. The Student details returned will contain a subject array of elements(Id, Name, Score etc.,) Each array element have to be updated in DB. We would like to do parallel processing of each Student row in CSV. Subsequently, we also want to do parallel processing of the subject array elements pertaining to each student. – Sabari Nov 05 '18 at 17:17
  • ok thank you for the clarification. You can use the AsyncItemProcessor to process each student line in a separate thread in parallel. Here is a use case very similar to what you want to do with more details in the answer: https://stackoverflow.com/a/52309260/5019386. Hope this helps. – Mahmoud Ben Hassine Nov 06 '18 at 12:31
  • Thank you for the details!! It helps. We were also thinking of using this in combination with Remote Chunking or Partitioning for Horizontal scaling as well. – Sabari Nov 06 '18 at 16:59

0 Answers0