2

I am using the java executor service to transfer data from one DB to another.

ExecutorService executor = (ExecutorService) Executors.newFixedThreadPool(10);

Once executor service is initialized. Records are retrieved from Source DB in batches of 500 and those records are indexed into the target DB.

while(true){  
   List<?> docs = getDataFromSourceDB();
   if(docs.size > 0){
       IndexdataToTargetDBThread thread = new IndexdataToTargetDBThread(docs)
       executor.submit(thread);
   }else{
        break;
   }
} 

executor.shutdown()
while (!eService.isTerminated()) {
}

The function getDataFromSourceDB() returns 500 records on each call and if no records are available it will return an empty array. The class IndexdataToTargetDBThread implements Runnable. It indexes the given documents to the target DB.

The exception Out Of Memory occurs when the above function is run. The Source DB has close to 1M records, hence the while loop is processed around 2000 times. After processing around 700 times out of memory occurs and entire DB transfer process fails. Is there any efficient way to handle this?

user3383301
  • 1,891
  • 3
  • 21
  • 49
  • 1
    Maybe don't keep so many jobs outstanding at once. Keep track of the number of jobs outstanding and if it's more than, say, 100, wait before queuing more. They're not going to get executed until the previous jobs are finished anyway. – David Schwartz Mar 25 '21 at 05:44
  • 2
    You are not using your executor to throttle the load. You're fetching all the data while submitting to the thread pool. You should call `getDataFromSourceDB()` in the Runnable's `run()` method so that a max of (500*10) records can be loaded at any given time. – ernest_k Mar 25 '21 at 05:44
  • Can you give me the content of `IndexdataToTargetDBThread` class? maybe you did not call `EntityManager.clear()` – dung ta van Mar 25 '21 at 06:34
  • @dungtavan : That has some confidential information I am not able to mask and share it. However, writing to target DB is handled well. EntityManager.clear() is not used. DB connections are properly opened, processed and closed. Like ernest mentioned im trying to throttle the inflow data. – user3383301 Mar 25 '21 at 06:42
  • By the way… Replace that tight loop `while (!eService.isTerminated()) {}` with `executor.awaitTermination()`. – Basil Bourque Mar 25 '21 at 07:39

1 Answers1

2

I see two problems. First, it is an unlimited data size fetched from DB. You may restrict the number of records that exist at the same time in the application. Create buffer(It could be a queue with limited elements) and flush batch once it ready to be wrote: https://docs.oracle.com/javase/8/docs/api/java/util/Queue.html)

Second: not sure about the database's isolation level. The next article describes some issues with concurrent insert in PostgreSQL: https://www.postgresql.org/files/developer/concurrency.pdf

Also, you can try to increase the java memory limit via -Xmx .. option: Increase heap size in Java

But that not the best way :)