0

I am new to the batch processing world and I am trying to solve the below mentioned problem using Spring Batch. I am really struggling at how to create multiple step batch job out of it.

Given

A csv file having records for multiple students

studentId subject1_score subject2_score subject3_score result
1 59 51 54 PENDING
2 79 20 76 PENDING

We have a REST endpoint which take students marks in all subjects and return result (pass/fail) for each student. Pass/fail logic is defined in the given rest endpoint.

TODO

Read the batch of records out of that csv, make a REST call per batch which updates the result on the basis of marks in all three subjects for each student. Update the result for each student and generate the output csv for all the records.

Class StudentMarksheet {
    String studentId;
    Integer subject1_score;
    Integer subject2_score;
    Integer subject3_score;
    String result;

    ...
}

Class GenerateResultRequestResponseDto {
    Long batchId
    List<StudentMarksheet> students;
    
    ...
}
studentId subject1_score subject2_score subject3_score result
1 59 51 54 PASS
2 79 20 76 FAIL

Update on Requirement

We can receive either a csv or an xml file. Based on the file type we have two different reader and writer (one for reading and writing csv file and one for xml file type).

My Design solution

Read single record and create a StudentMarksheet object from it -> processor decided where we have a valid record or not -> writer prepares the GenerateResultRequestResponseDto, execute the rest call for 1 batch of records and write it to csv file.

Big question here is do I make two jobs, one for CSV & other for XML?

truekiller
  • 470
  • 6
  • 19
  • `make a REST call per batch`: Does this mean your REST endpoint accepts a list of students and not a single student? – Mahmoud Ben Hassine May 19 '21 at 16:02
  • Yes @MahmoudBenHassine, REST endpoint accepts a list of students. One file contains 100K records. Otherwise I will endup firing 100K REST calls. Plus there will be more than 1 file. – truekiller May 20 '21 at 05:27
  • @MahmoudBenHassine What if we have two different types of file, xml & csv. And now i have two different ItemReader and ItemWriter, for xml and csv. Will i be having 2 jobs, one for csv one for xml ? – truekiller May 24 '21 at 06:10
  • If those are independent tasks, you could run them is two parallel steps within the same job. – Mahmoud Ben Hassine May 24 '21 at 20:36
  • @MahmoudBenHassine Yes, they are independent tasks. Should I make two jobs, one for CSV and one for XML? Or the other way around two steps running in parallel? – truekiller May 25 '21 at 07:04
  • @MahmoudBenHassine I am just waiting for couple more days, if anybody wants to answer on this problem. After that I will accept the answer – truekiller May 25 '21 at 10:09

1 Answers1

1

Since you REST endpoint accepts a list of students that you need to process in chunks just before writing them to the file, you can use an ItemWriteListener#beforeWrite(List) and make your call in there. This listener is the first extension point where get a list of items. So your chunk-oriented step could be designed as follows:

  • Item reader: FlatFileItemReader to read students one by one
  • Item processor: validate students
  • ItemWriteListener: Make the REST call for the current chunk of students and update their statuses
  • ItemWriter: write updated students to the output file
Mahmoud Ben Hassine
  • 28,519
  • 3
  • 32
  • 50
  • @truekiller I believe this answers your question, so please accept it. Otherwise, let me know what is missing to accept it. – Mahmoud Ben Hassine May 24 '21 at 20:37
  • Mahmoud, could you have a look at a another spring batch problem. https://stackoverflow.com/questions/67909123/copy-header-tag-in-xml-spring-batch-application – truekiller Jun 09 '21 at 17:45
  • Recently I came across CompositeItemWriter in spring batch. Having two writer, one acting as a rest client and second will write data to file. Will Composite writer is a better design for my problem or making rest call in ItemWriterListener is a better design as per spring batch? – truekiller Jun 12 '21 at 10:15
  • 1
    It depends on your requirement: if the REST call will enrich items and should be done before items are written, then using a listener is the way to go. if the REST call is used to write items to a data sink *in addition* to another location, then a composite writer is also an option. But your question implies that the REST call will return a result (mark student's result as pass/failed) that is needed *before* writing to the csv: with a composite writer, the **same** items are written to multiple data sources. – Mahmoud Ben Hassine Jun 13 '21 at 20:38
  • Should I call it a bad design or wrong implementation if my first writer is making a REST call and based on the response it also enrich the items and then delegating it to the second writer, which is writing it to the CSV file ? – truekiller Jun 14 '21 at 08:29
  • 1
    If you are changing items inline (ie mutating the objects) in the first writer, then indeed the second writer in the composite will see those changes. This might work, but it is not as clean as using a listener to prepare items before writing them with a writer. – Mahmoud Ben Hassine Jun 15 '21 at 08:50