I am facing a similar issue as described in this question. Spring batch jpaPagingItemReader why some rows are not read?
I have to read some records from a table like
SELECT * FROM TABLE1 WHERE COLUMN1 = NULL
and based on value of other columns, do some processing(connect to REST service) and fetch some data and update in COLUMN1.
Since, I am using RepositoryItemReader
with taskExecutor
the paginated fetch is not working fine and around half of the records are being skipped from the actual eligible records.
In order to avoid this issue, I reduced the query to SELECT * FROM TABLE1
so that pagination is not affected since the query will be idempotent. I put a check in code to skip the record if the column is not null. Even with this setup, the records are still being skipped.
Also, this problem only occurs when the page size is smaller than the actual number of records.
If I keep the page size greater than or equal to the total number of eligible records, I don't face any issues. I am not sure if such a large page size (~100000) is a wise thing to have.
I noticed that in such case, a single query retrieves all the records, and then the taskExecutors process and write those records in different threads.
Due to high volume of data, I cannot avoid multi-threading as single-threaded mode is dreadfully slow.
Any pointers what can be done?