1

I have a spring batch step that reads from a file, processes the records and writes to a file using chuck processing. The file is expected to have millions of large records. I read that Spring holds [chunk-size] number of processed records in memory before passing it to the writer.

To optimize memory usage I kept the [chunk-size] small. This however increases the number of updates the step does to the BATCH_STEP_EXECUTION metadata table to update the read and commit count.

Given I am reading and writing to local files, the updates to a remote database server are relatively expensive. If I increase the [chunk-size], the memory usage goes up.

The commit-frequency doesn't really matter much to writing local files so it is the metadata updates that are a problem for me. The step is restartable so technically I have no need to log the intermediate commit counts.

I could just use a map or in memory database for JobRepository but I need the other information such as the start/end times persisted and also this concern is only for a single step.

Are there any configuration parameters that could turn off the intermediate commit count updates to the job repository or say write out the chunk records from memory to storage earlier only committing at chunk-size / commit-frequency. Basically I am looking if there is something that separates chunk-size from commit-frequency.

uncaught_exception
  • 1,068
  • 6
  • 15
  • 1
    You mentioned using an in-memory job repository, that's the way to go in your case IMO. `I am looking if there is something that separates chunk-size from commit-frequency`: The commit frequency depends on the chunk size and the number of items in your datasource. Those are related concepts, it is not possible to separate them. The chunk size value is a tradeoff between memory usage and speed as you described, and the "best" value can only be determined in an empirical way. Hope this helps. – Mahmoud Ben Hassine Apr 15 '19 at 09:34

1 Answers1

0

you can skip the metadata updates directly by using the MapJobRepositoryFactoryBean in your job repositery

 <bean id="jobRepository"
    class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean">
    <property name="transactionManager" ref="transactionManager" />
  </bean>

these answers might help you

BATCH_WITHOUT_PERSISTING_METADATA

SKIP_METADATA_SAVE_SPRING_BATCH

Ashish Shetkar
  • 1,414
  • 2
  • 18
  • 35
  • I believe I have already addressed that in my question "I could just use a map or in memory database for JobRepository but I need the other information such as the start/end times persisted and also this concern is only for a single step" – uncaught_exception Apr 12 '19 at 14:22
  • well if you want to track somethings like this, then i think you need to implement it on your own, am not sure if spring batch allows it to do that way - you can implement some audit steps on your own to track start/end times persisted and what ever you need – Ashish Shetkar Apr 14 '19 at 09:51