0

I am facing a similar issue as described in this question. Spring batch jpaPagingItemReader why some rows are not read?

I have to read some records from a table like

SELECT * FROM TABLE1 WHERE COLUMN1 = NULL

and based on value of other columns, do some processing(connect to REST service) and fetch some data and update in COLUMN1.

Since, I am using RepositoryItemReader with taskExecutor the paginated fetch is not working fine and around half of the records are being skipped from the actual eligible records.

In order to avoid this issue, I reduced the query to SELECT * FROM TABLE1 so that pagination is not affected since the query will be idempotent. I put a check in code to skip the record if the column is not null. Even with this setup, the records are still being skipped. Also, this problem only occurs when the page size is smaller than the actual number of records. If I keep the page size greater than or equal to the total number of eligible records, I don't face any issues. I am not sure if such a large page size (~100000) is a wise thing to have. I noticed that in such case, a single query retrieves all the records, and then the taskExecutors process and write those records in different threads. Due to high volume of data, I cannot avoid multi-threading as single-threaded mode is dreadfully slow.

Any pointers what can be done?

bluelurker
  • 1,353
  • 3
  • 19
  • 27
  • Does this answer your question? https://stackoverflow.com/questions/68744338/spring-batch-itemreader-skips-the-first-record-on-the-second-page/68745801#68745801 – Henning Sep 07 '21 at 15:06
  • Yes, I am aware of the effects of using a nonidempotent query. I removed the columns from the query that change and moved them to the processor. If I use taskExecutor, will each thread read it's own chunk? I mean each thread will have it;s own read, process, flow?How can ensure that in a multithreaded environment, each thread reads the page synchronously ? – bluelurker Sep 07 '21 at 15:56

1 Answers1

0

You are basically trying to implement the process indicator pattern in a multi-threaded step. There is a sample for that here: Parallel sample. The idea is to use a staging table with the process indicator instead of modifying the original table.

That said, I am not sure if the process indicator pattern can be implemented with a paging technique. A partitioned step where each partition is read in sequence with a cursor-based reader is a better option in my opinion.

Mahmoud Ben Hassine
  • 28,519
  • 3
  • 32
  • 50