I have a question about writing operations in Spring Batch on databases through the ItemWriter<T>
contract. To quote from The Definitive Guide to Spring Batch by Michael T. Minella:
All of the items are passed in a single call to the ItemWriter where they can be written out at once. This single call to the ItemWriter
allows for IO optimizations by batching the physical write. [...] Chunks are defined by their commit intervals. If the commit interval is set to 50 items, then your job reads in 50 items, processes 50 items, and then writes out 50 items at once.
Yet when I use, say, HibernateItemWriter
or JpaItemWriter
in a step-based job to write to the database in a Spring-Boot-based app with all the Spring Batch infrastructure in place (@EnableBatchProcessing
, Step/JobBuilderFactory
, etc.) together with monitoring tools for verifying the number of insert/update statements like implementations of the MethodInterceptor
interface, I notice that the number of inserts performed by the writer is equal to the total size of records to process instead of the number of chunks set for that job.
For example, upon inspection of the logs in Intellij from a job execution of 10 items with a chunk size of 5, I found 10 insert statements
Query:["insert into my_table (fields...
instead of 2. I also checked for insert statements in the general_log_file
for my RDS instance and found two 'Prepare insert' statements and one 'Execute insert' statement for each item to process.
Now I understand that a writer such as JpaItemWriter<T>
's method write(List<? extends T> items)
loops through the items calling entityManager.persist/merge(item)
- thereby inserting a new row into the corresponding table - and eventually entityManager.flush()
. But where is the performance gain provided by the batch processing, if there is any?