0

I want your help in order to improve in time the following procedure in Java. The procedure steps are the following:

  • I have a table with more than a million records (primary key is auto increment).
  • I select the min and max primary key value from this table.
  • I create some initial 'fromRange' and 'toRange' variables based on the min and max values
  • After I create a loop where I process 20000 records each time:

    I fetch the records between 'fromRange' up to 'toRange'

    For each record return, I write (append each time) to an XML object (using JAXB)

    After I write the XML object created on a file on a disk.

    Increase the 'fromRange' and 'toRange' to continue to the next records.

Procedure ends after all records have been process.

This execution takes more than 12 Hours on a normal PC to finish. I was wondering how can I improve this code to export the files faster. Maybe using threading?

Thanks

Bad_Pan
  • 488
  • 13
  • 20

4 Answers4

0

I fetch the records between 'fromRange' up to 'toRange'

and

I fetch the records between 'fromRange' up to 'toRange'

Are IO steps, that block computing. Multithreading is a solution to ensure your machine resources are being used optimally.

Of course you should profile this on your own and see that the thread is blocked a lot of the time. If so, yes, multithreading is valid.

djechlin
  • 59,258
  • 35
  • 162
  • 290
0

Comments:

  • I have a table with more than a million records (primary key is auto increment).

That is ok, as it is the primary key it has automatically an index most of the DBMS.

  • I select the min and max primary key value from this table.

You might do this via the first row and last row of your DMBS functions. That's then really selective and should not take long

  • I create some initial 'fromRange' and 'toRange' variables based on the min and max values

Most of the modern DMBS can save their indices as a B* tree. This means that you have a tree structure which is very fast in finding a value and then the leaves are linked via a linked list which makes it fast to find a rage then. So this should also be selective and not take too much time.

  • After I create a loop where I process 20000 records each time

I would try to create a Java object and at the very end do the serialization via JAXB.

In general you need to do some trace to see which step consumes most of the time.

0

It would be definitely better to do all the work in parallel. Keep the main thread reading from the database but all the records, i.e. select * from MyTable order by myId.

Then create ExecutorService by calling one of the methods from Executors factory like newCachedThreadPool.

Then in the main thread keep looping over the records and for each of them send the executor.submit(new Runnable() { doYourWork(record); }, null);. Note the record must be a copy as it will be accessed from different thread!

At the end call executor.shutdown() and executor.awaitTermination(). You can check potential errors by calling get() on Futures returned by submit method.

The other way, if you want more advanced solution, you may consider using Apache Camel for this, specifically SQL example.

Zbynek Vyskovsky - kvr000
  • 18,186
  • 3
  • 35
  • 43
0

Your question is not complete: no total count, no database type, no information about record size. But in general:

  1. Do not use max/min - just select all records and iterate over them
  2. Pay attention to fetch size parameters in JDBC. It is place where you should set 20000
  3. Use JAXB in streaming mode (see JAXB fragment)
  4. Do not forget about Outputstream buffering
Community
  • 1
  • 1
sibnick
  • 3,995
  • 20
  • 20