2

I'm a noob in hibernate and I have to read 2 million records from a DB2 z/OS-Database with hibernate in Java. (JDBC)

My problem is, that I run OutOfMemory after 150000 records.

I've heard about batching etc, but I only find solutions for actually inserting new records. What I want to do is to read this records in an ArrayList for further usage.

So I'm actually just selecting one row of the database to reduce the data:

getEntityManager().createQuery("select t.myNumber from myTable t").getResultList();

Also it would be interesting, if there is a better way to read such a huge amount of records.(Maybe without Hibernate?)

Christian
  • 22,585
  • 9
  • 80
  • 106
  • 2
    You can not rather should not load that many records in memory. You should do it in batch i.e load a batch of 100(depends) and process it and then take the next 100. – bitkot Jul 01 '14 at 07:59
  • Try reading on [Apache Solr](https://wiki.apache.org/solr/) – Nikhil Talreja Jul 01 '14 at 08:00
  • A) don't use hibernate for even slightly large rowsets, b) use raw SQL and a cursor to scroll through the results, so there's only one row in memory at a time – Bohemian Jul 01 '14 at 08:05

6 Answers6

7

The following is the way to do batch processing using hibernate. Keep in mind this is not 100% tested. It's kind of pseudo logic.

int i=0;
int batch = 100;
List<myNumber> numList = getEntityManager().createQuery("select t.myNumber from myTable t").setFirstResult(i).setMaxResults(batch).getResultList();

while(numList.size() == batch){
    //process numList
    i+=batch;
    numList = getEntityManager().createQuery("select t.myNumber from myTable t").setFirstResult(i).setMaxResults(batch).getResultList();    
}

Hibernate documentation for setFirstResult() and setMaxResults()

bitkot
  • 4,466
  • 2
  • 28
  • 39
3

A best approach is to use statelessSession ( no deal with cache ) and bulk operations with the scrollableResults method :

StatelessSession statelessSession = sessionFactory.openStatelessSession(connection);
    try {
     ScrollableResults scrollableResults = statelessSession.createQuery("from Entity").scroll(ScrollMode.FORWARD_ONLY);

     int count = 0;
     while (scrollableResults.next()) {
      if (++count > 0 && count % 100 == 0) {
       System.out.println("Fetched " + count + " entities");
      }
      Entity entity = (Entity) scrollableResults.get()[0];
      //Process and write result
     }
    } finally {
     statelessSession.close();
    }
biology.info
  • 3,500
  • 2
  • 28
  • 39
2

You should not load all records into memory but process them in batch, e.g: loop every 1000 records by using createQuery(...).setFirstResult(i*1000).setMaxResults(1000);

Bill Lin
  • 1,145
  • 1
  • 11
  • 12
0

You have found the upper limit of your heap. Have a look here to know how to properly size you heap:

Increase heap size in Java

However, I cannot imagine why you would need to have a List of 3 million records in memory. Perhaps with more information we could find an alternative solution to your algorithm?

Community
  • 1
  • 1
Jorge_B
  • 9,712
  • 2
  • 17
  • 22
0

Yes off course You may use The Apache™ Hadoop® for large project . it develops open-source software for reliable, scalable, distributed computing. It is designed to scale up from single servers to thousands of machines hadoop apache

MSR
  • 535
  • 7
  • 19
0

This is basically a design question for the problem you are working on. Forget Hibernate even if you are doing the same thing in plain JDBC you will hit the memory issue, maybe a bit late. The idea of loading such huge data and keeping in memory is not suitable for applications requiring short request-response cycles and is not good for scalability either. As others have suggested you can try the batch or paging behaviour or if you want to be more exotic you can try parallel processing via distributed data-grid (like Infinispan) or map-reduce framework from Hadoop.

Going by the description of your problem it seems that you need to keep the data around in memory. If you must keep the huge data around in memory then you can query the data in batches and keep storing them in a distribited cache (like Infinispan) which can span multiple JVM's on single machine or multiple machine forming a cluster. This way your data will reside partially on each node.Here Infinispan can be used as a distributed cache.

There are framework like Spring Batch which take the route of solving such problems by dividing the work into chunks (batch) and then processing them one by one. It has even inbuilt JPA based readers and writers which perform this work in a batch.

Shailendra
  • 8,874
  • 2
  • 28
  • 37