1

REQUIREMENT:

A SELECT query may result to 200K records. We need to fetch the records, process it and write it in the database with processed results.

Database Used: NuoDB

PROBLEM:

I have read in NuoDB documentation:

Results of queries to the transaction engine will be buffered back to the client in rows that fit into less than 100 KB. Once the client iterates through a set of fetched results, it will go back to the TE to get more results.

I am not sure whether the engine of the database can return 200K records at once. And I also feel that holding 200K records in a List variable is not ideal. The SELECT query has date field in the where clause like:

SELECT * FROM Table WHERE DATE >= '2020-06-04 00:00:00' AND DATE < '2020-06-04 23:00:00'

The above query may result in 200K records.

I thought of dividing the query like:

SELECT * FROM Table WHERE DATE >= '2020-06-04 00:00:00' AND DATE < '2020-06-04 10:00:00'
SELECT * FROM Table WHERE DATE >= '2020-06-04 10:00:01' AND DATE < '2020-06-04 18:00:00'
SELECT * FROM Table WHERE DATE >= '2020-06-04 18:00:01' AND DATE < '2020-06-04 23:00:00'

But I am not sure whether this approach is ideal. Please advice.

So Please consider 3 classes A, B, C that extends ItemReader, ItemProcessor and ItemWriter respectively.

        .get("stepToReadDataFromTable")
        .<Entity1, Entity2>chunk(2000)        
        .reader(A()) 
        .processor(B())
        .writer(C())
        .build();

Can we do like this: Class A will extract 2000 records out of 200K records and process it and after processing 2000 records (as mentioned in the chunk) write it into the database. This loop will go on untill all the 200K records are processed.

If yes, how can we achieve this. Is there any way to extract data from the select query in chunks?

Pratim Singha
  • 569
  • 2
  • 10
  • 33
  • Your last code snippet is the way to go. Spring Batch is exactly meant for handling such use cases. Give it the whole query (no need to split it) and define the chunk size as you did. If you want the 2000 records to be committed at once you must define the commit interval accordingly. Also use a paginated item reader and specify a page size that matches the interval commit. – alainlompo Jun 04 '20 at 19:02

1 Answers1

0

You can use a paging item reader to read items in pages instead of loading the entire data set in memory.

This is explained in the chunk-oriented processing section of the reference documentation.

This seems related to Will the SELECT Query be able to retrieve 200K records in Spring Batch.

Mahmoud Ben Hassine
  • 28,519
  • 3
  • 32
  • 50