I am investigating a spring batch job developed by someone else. Each step reads from the database in its reader, transforms the read JPA entities to DTOs in processor and the writer does several operations on each of these DTOs that may involve database queries, third party API calls, or both.
The JPQL used in the reader picks only 105 records (limit 105
is attached to the resulting SQL query) - I am using org.springframework.batch.item.database.JpaPagingItemReader
and passing pageSize=105
. Also, I have overridden org.springframework.batch.item.database.AbstractPagingItemReader#getPage
to always return 0. This is because the table continuously gets new entries inserted; keeping getPage
to its default implementation could risk missing some records. The JPQL used in reader itself takes care of the ordering and filtering of relevant records.
Each step is configured to operate on the read records in chunks of 15 records. However, the read_count, write_count are much higher considering the record limit and chunk size configurations:
postgres_database=> select * from batch_step_execution where step_execution_id = 445124;
step_execution_id | version | step_name | job_execution_id | start_time | end_time | status | commit_count | read_count | filter_count | write_count | read_skip_count | write_skip_count |
process_skip_count | rollback_count | exit_code | exit_message | last_updated
-------------------+---------+-------------------------------------+------------------+-------------------------+-------------------------+-----------+--------------+------------+--------------+-------------+-----------------+------------------+
--------------------+----------------+-----------+--------------+-------------------------
445124 | 896 | step1 | 278076 | 2023-01-27 16:08:02.074 | 2023-01-29 21:41:09.375 | COMPLETED | 894 | 13395 | 0 | 13395 | 0 | 0 |
0 | 0 | COMPLETED | | 2023-01-29 21:41:09.375
(1 row)
As can be seen, the read_count is 13395 (while a limit of 105 gets attached to the reader JPQL as mentioned earlier), write_count = read_count. I was thinking in terms of possible retries, but based on what I found about Spring Batch, the rollback_count should be >0 (since a retry is preceded by a rollback).
Adding the step configuration for reference:
@Bean
public Step step1(
StepBuilderFactory stepBuilderFactory,
ItemReader<JPAEntity> reader,
ItemWriter<DTO> writer) {
return stepBuilderFactory
.get("step1")
.<JPAEntity, DTO>chunk(15)
.reader(reader)
.processor((ItemProcessor<JPAEntity, DTO>) Transformer::fromEntityToDto)
.writer(writer)
.faultTolerant()
.noRollback(Exception.class)
.build();
}
I have been unable to figure out why this could be happening. I found Spring Batch docs weren't that helpful, and couldn't see anything similar being discussed in any other thread (on StackOverflow, or anywhere else).
Any kind of help would be highly appreaciated. Thank you.
I went through Spring Batch docs, read a few articles, stackoverflow answers. But found nothing. Even ran a simple Spring Batch job but never saw it happen in that application.