I've been working spring boot batch job to pull data from DB2 AS400 sys. The amount of records expected to be retrieved in the end game is 1 million+, but for now I'm working with a smaller subset of 100,000 records. I configured a JdbcPagingItemReader
with a pageSize
of 40,000 and a fetchSize
of 100,000. The step for this is chunking the data into chunks of 20,000. When I ran with logging set to trace I notice that the query was modified with FETCH FIRST 40000 ROWS ONLY
, which at first I didn't think was an issue until I noticed that the job was only retrieving 40,000 records and then ending. I could very well be missing understanding how paging is supposed to work with Spring Batch but my assumption was that Spring use the fetchSize
as the total amount to retrieve per query, and then that would be split into pages of 40,000 each, and then that would be chunked into 20,000 records per chunk. My end goal is process all 100,000 records maintaining a high performance bar. Any help explaining how paging actually works would be most helpful. Code examples below.
Spring Boot v2.3.3
Spring Batch Core v4.2.4
Step Bean
@Bean(name = "step-one")
public Step stepOne(@Autowired JdbcPagingItemReader<MyPojo> pagingItemReader) {
return stepBuilderFactory.get("step-one")
.<Product.ProductBuilder, Product>chunk(chunkSize)
.reader(pagingItemReader)
.processor(itemProcessor)
.writer(itemWriter)
.taskExecutor(coreTaskExecutor)
.listener(chunkListener)
.allowStartIfComplete(true)
.build();
}
JdbcPagingItemReader Bean
@StepScope
@Bean(name = "PagingItemReader")
public JdbcPagingItemReader<MyPojo> pagingItemReader(@Autowired PagingQueryProvider queryProvider) {
return new JdbcPagingItemReaderBuilder<MyPojo>().name("paging-reader")
.dataSource(dataSource)
.queryProvider(queryProvider)
.rowMapper(mapper)
.pageSize(pageSize)
.fetchSize(fetchSize)
.saveState(false)
.build();
}
Properties
#APPLICATION BATCH CONFIGURATION
application.batch.page-size=40000
application.batch.chunk-size=10000
application.batch.fetch-size=80000