Why does the read method execute multiple times in spring batch?

Question

I am currently using Spring Batch. I created Reader, Writer and a Processor. The Reader is a basic Custom ListItemReader.

public class CustomListItemReader<T> implements ItemReader<T> {
    private List<T> list;

    public List<T> getList() {
        return list;
    }

    public void setList(List<T> list) {
        log.debug("Set list of size {}", list.size());
        if (AopUtils.isAopProxy(list)) {
            this.list = list;
        } else {
            this.list = new ArrayList<T>(list);
        }
    }

    @Override
    public synchronized T read() {
        log.info("Inside custom list item reader");
        if (list != null && !list.isEmpty()) {
            log.info("Inside read not empty");
            T remove = list.remove(0);
            while (remove == null && !list.isEmpty()) {
                remove = list.remove(0);
            }
            return remove;
        }
        return null;
    }
}

I tried testing Spring batch with and without a taskExecutor. Without the taskExecutor the

Inside custom list item reader

log gets printed twice. I get that, it is printed once for the actual job and once to check whether any inputs exists or not. When the reader returns null, the job completes and stops. That's fine , but when I do the same with a taskExecutor with a configuration as shown below

public TaskExecutor taskExecutor() {
        ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
        taskExecutor.setMaxPoolSize(1);
        taskExecutor.setCorePoolSize(1);
        taskExecutor.setQueueCapacity(1);
        taskExecutor.afterPropertiesSet();
        return taskExecutor;
    }

and I even set the throttle-limit to 1. I assumed that the above taskExecutor mimics the single thread scenario. And since the there is only one active thread and throttle-limit = 1 , the log would get printed twice, same as in the previous configuration. But the message gets logged thrice.

Why is there an extra log printed? Hows does the task count get increased by 1?

Also, just for the sake of experimenting I kept the throttle-limit to 20 and the corePoolSize, maxPoolSize and queueCapacity to 1 . The job doesn't end at all. and I get an exception:

java.util.concurrent.RejectedExecutionException: Task com.esewa.settlementswitch.transaction.cooperative.BatchConfig$ClientSettlementTaskDecorator$$Lambda$1111/698696362@4d3e6424 rejected from org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor$1@60e29dbd[Running, pool size = 1, active threads = 1, queued tasks = 1, completed tasks = 0]

I know that the job was Rejected because the pool size is 1 and the queue is also full and no new tasks can be submitted. But the question is why did so many tasks start ?

On why the tasks are rejected when throttle-limit >> maxPoolSize, please check the explanation in this answer: https://stackoverflow.com/a/54744760/11398645 Basically your spring batch submitted more tasks then the Executor is willing to accept. — Ashutosh, Jul 28 '21 at 19:13
@Ashutosh the above link does not answer my question. And my I know why my task was rejected, but I don't know why even though submitting a single task in a single threaded threadExecutor I got that error. — Mohendra Amatya, Jul 29 '21 at 03:52
As per the doc, the recommendation is to keep pool size greater than throttle limit. ref: [link] (https://docs.spring.io/spring-batch/docs/4.1.x/api/org/springframework/batch/repeat/support/TaskExecutorRepeatTemplate.html#setThrottleLimit-int-) `N.B. when used with a thread pooled TaskExecutor the thread pool might prevent the throttle limit actually being reached (so make the core pool size larger than the throttle limit if possible)` — Ashutosh, Jul 29 '21 at 05:53
Also there are 2 separate things here: Number of tasklets created are by spring batch framework. Throttle limit for the controls that how many tasklets can run at a time. Thread pool for available threads who can process the work. Both are independently configurable. — Ashutosh, Jul 29 '21 at 05:54

score 1 · Answer 1 · answered Aug 07 '21 at 13:24

The difference between the cases without and with TaskExecutor is that different RepeatOperations are used to control the execution of chunks.

In the sequential case without a user-defined TaskExecutor, one chunk will be executed in exactly the way you described: The read method of the reader will be invoked once for the single item in its list. And a second time for the return value null, which signals that no more items are available. The RepeatOperations that is used in this case is the RepeatTemplate, which executes the chunks sequentially and will not execute further chunks once a chunk has read null from its reader.

In the multi-threaded case with a TaskExecutor, the TaskExecutorRepeatTemplate will be used to execute the chunks instead. It will submit chunk executions to the TaskExecutor until either the throttle limit is reached or a result has been placed in its result queue.

With a single thread and throttle limit 1, the following happens: The TaskExecutorRepeatTemplate submits one chunk execution to its task executor and the chunk will execute in the single thread of the executor as described for the sequential case. Meanwhile, the TaskExecutorRepeatTemplate will continue to submit tasks. It will block while submitting the second chunk execution because of the throttle limit and it will only unblock when the first chunk execution has finished. But between the unblocking and the actual submission of a new chunk execution no check is performed whether the additional chunk execution is actually still required. In the second execution, the read method is only called once as it returns null the first time now.

When you increase the throttle limit to 2 you should see 4 logs being printed, because the TaskExecutorRepeatTemplate will only be blocked when submitting the third chunk execution. The number of logs is actually not guaranteed, because it depends on the order of events that happen in different threads, but this effect should be well reproducible.

Why does the read method execute multiple times in spring batch?

1 Answers1