1

I've been working spring boot batch job to pull data from DB2 AS400 sys. The amount of records expected to be retrieved in the end game is 1 million+, but for now I'm working with a smaller subset of 100,000 records. I configured a JdbcPagingItemReader with a pageSize of 40,000 and a fetchSize of 100,000. The step for this is chunking the data into chunks of 20,000. When I ran with logging set to trace I notice that the query was modified with FETCH FIRST 40000 ROWS ONLY, which at first I didn't think was an issue until I noticed that the job was only retrieving 40,000 records and then ending. I could very well be missing understanding how paging is supposed to work with Spring Batch but my assumption was that Spring use the fetchSize as the total amount to retrieve per query, and then that would be split into pages of 40,000 each, and then that would be chunked into 20,000 records per chunk. My end goal is process all 100,000 records maintaining a high performance bar. Any help explaining how paging actually works would be most helpful. Code examples below.

Spring Boot v2.3.3 Spring Batch Core v4.2.4

Step Bean

    @Bean(name = "step-one")
    public Step stepOne(@Autowired JdbcPagingItemReader<MyPojo> pagingItemReader) {
        return stepBuilderFactory.get("step-one")
            .<Product.ProductBuilder, Product>chunk(chunkSize)
                .reader(pagingItemReader)
                    .processor(itemProcessor)
            .writer(itemWriter)
                .taskExecutor(coreTaskExecutor)
                    .listener(chunkListener)
            .allowStartIfComplete(true)
        .build();
    }

JdbcPagingItemReader Bean

    @StepScope
    @Bean(name = "PagingItemReader")
    public JdbcPagingItemReader<MyPojo> pagingItemReader(@Autowired PagingQueryProvider queryProvider) {
        return  new JdbcPagingItemReaderBuilder<MyPojo>().name("paging-reader")
            .dataSource(dataSource)
                .queryProvider(queryProvider)
                    .rowMapper(mapper)
            .pageSize(pageSize)
                .fetchSize(fetchSize)
            .saveState(false)
        .build();
    }

Properties

#APPLICATION BATCH CONFIGURATION
application.batch.page-size=40000
application.batch.chunk-size=10000
application.batch.fetch-size=80000
Joseph Freeman
  • 1,644
  • 4
  • 24
  • 43
  • Does this https://stackoverflow.com/a/58058009/5019386 answer your question? Are you updating items that might match the search criteria in your step? I'm thinking of something similar to https://stackoverflow.com/questions/58409247/spring-batch-not-processing-all-records or https://stackoverflow.com/questions/26509971/spring-batch-jpapagingitemreader-why-some-rows-are-not-read. `I noticed that the job was only retrieving 40,000 records and then ending`:If you have a minimal example that reproduces the issue (with a dockerized DB2 or an embeddable db), I can take a look. – Mahmoud Ben Hassine Mar 01 '21 at 13:24
  • Good morning! Thank you for your response! What do you mean by "Are you updating items that might match the search criteria in your step?" I'll took a look at the links above today and I'll try to make some time to create a example and upload it to GitHub. Thank you again for your response! – Joseph Freeman Mar 01 '21 at 13:52
  • I mean in your step (processor or writer), are you updating a field which is used in the `where` clause of your sql statement? Something similar to what's happening here: https://stackoverflow.com/questions/26509971/spring-batch-jpapagingitemreader-why-some-rows-are-not-read . I don't know if the same issue happens with the `JdbcPagingItemReader`, but this is what I'm trying to exclude. Anyway, a minimal example would really help for this kind of issues. – Mahmoud Ben Hassine Mar 01 '21 at 15:27
  • as i understand these 3 properties, it makes only sense: `fetchSize =< pageSize =< chunkSize`! (?) (https://docs.spring.io/spring-batch/docs/current/reference/html/index-single.html ...) – xerx593 Mar 01 '21 at 16:23
  • chunkSize: the size of 1 "transaction"/step (read, process & *write*), pageSize: the size of 1 "read", fetchSize: additional segmentation of page... – xerx593 Mar 01 '21 at 16:27
  • @MahmoudBenHassine No I'm not updating any fields that are used in the where clause. – Joseph Freeman Mar 01 '21 at 16:54

0 Answers0