3

Tried to find if this was asked before but couldn't.

Here is the problem. The following has to be achieved via Spring batch There is one file to be read and processed. The item reader is not thread safe. The plan is to have multithreaded homogenous processors and multithreaded homogenous writers injest items read by a single threaded reader.

Kind of like below:

        ----------> Processor #1 ----------> Writer #1
       |
    Reader -------> Processor #2 ----------> Writer #2
       |
        ----------> Processor #3 ----------> Writer #3

Tried AsyncItemProcessor and AsyncItemWriter, but holding debug point on processor resulted in reader not being executed until the point was released i.e. single threaded processing.

Task executor was tried like below:

<tasklet task-executor="taskExecutor" throttle-limit="20">

Multiple threads on the reader were launched.

Synchronising the reader also didn't work.

I tried to read about partitioner but it seemed complex.

Is there an annotation to mark the reader as single threaded? Would pushing read data to Global context be a good idea?

Please guide towards a solution.

Programmer
  • 31
  • 1
  • 2
  • Why was partitioning too complex? – Michael Minella May 29 '17 at 15:23
  • Thank you for your comment. I just had a few minutes to look at it and the more I read the more I got confused. I have gone through it end to end and it doesn't seem to be the traditional solution for my problem 'cause I don't want to split my input. Am I wrong? – Programmer May 29 '17 at 16:19
  • Is SynchronizedItemReader the optimum solution? – Programmer May 29 '17 at 17:23
  • Yes it is, but be sure the writing is really the bottleneck. – Michael Minella May 29 '17 at 23:12
  • Yes you are correct. The bottle neck needs to be identified first. – Programmer May 30 '17 at 04:04
  • The reason I don't want to split my input file is because its just a couple of GBs. It would take a few seconds or so to read the file with SynchronisedItemReader but I want your opinion. The highest file size would be around 5 GB with records of 1500 char length. Do you think we should split files and implement partitioning or the SynchronisedItemReader would be good? – Programmer May 30 '17 at 04:07

2 Answers2

1

I guess nothing is in built in Spring Batch API for the pattern that you are looking for. Coding on your part would be needed to achieve what you are looking for.

Method ItemWriter.write already takes a List of processed items based on your chunk size so you can divide up that List into as many threads as you like. You spawn your own threads and pass a segment of list to each of threads to write .

Problem is with method ItemProcesor.process() as it processes item by item so you are limited by a single item and you wouldn't be able to much of a threading for a single item.

So challenge is to write your own reader than can hand over a list of items to processor instead of a single item so you can process those items in parallel & writer will work on a list of list.

In all of this set up, you have to remember that threads spawned by you will be out of read - process - write transaction boundary of Spring batch so you will have to take care of that on your own - in terms of merging processed output for all threads and waiting till all threads are complete and handling any errors. All in all, its very risky.

Making a item reader to return a list instead single object - Spring batch

Sabir Khan
  • 9,826
  • 7
  • 45
  • 98
0

Came across this with a similar problem at hand.

Here's how I am doing it at the moment. As @mminella suggested, synchronized itemReader with the flatfileItemReader as delegate. This works with decent performance. The code writes about ~4K records per second at the moment but the speed doesn't entirely depend on the design, other attributes contribute as well.


Tried other approaches to increase performance, both kind of failed.

  1. Custom Synchronized ItemReader that aggregates with FlatFileItemReader as delegate but I ended up with maintaining a lot a state that caused performance drop. Maybe the code needed optimization or Synchronization is just faster.
  2. Fired each insert PreparedStatement batch in different thread but didn't increase much performance but I still am counting on this in case I run into an environment where individual threads for batches would result in significant performance boost.
Anonymous
  • 123
  • 1
  • 4