3

I'm using Spring Data, JPA, and Hibernate to perform a function on each record greater than a given ID.

Here's my DAO:

public interface MyEntityDao extends JpaRepository<MyEntity, Long>, {

    @QueryHints(value = @QueryHint(name = org.hibernate.jpa.QueryHints.HINT_FETCH_SIZE, value = "1000"))
    Stream<MyEntity> findByIdGreaterThanOrderByIdAsc(Long id);
}

The method gets used like this, and it works:

@Transactional(readOnly = true)
public void printRecordsGreaterThan(Long lastId) {
    myEntityDao.findByIdGreaterThanOrderByIdAsc(lastId).forEach((entity) -> {
        System.out.println("entity: " entity.getId());
    });
}

The issue is when this operation needs to scan a very large range. I monitored it with VisualVM and it's keeping all the records in memory (Tens of Gigs worth of RAM).

Is there any way to have this code release the resources once they're processed rather than keep them in memory?

Thanks in advance!

Solution

Thanks to @julodnik in the comments, invoking clear() on the entity manager every so often solved the issue.

@PersistenceContext
private EntityManager em;

@Transactional(readOnly = true)
public void printRecordsGreaterThan(Long lastId) {
    AtomicLong counter = new AtomicLong();
    myEntityDao.findByIdGreaterThanOrderByIdAsc(lastId).forEach((entity) -> {
        long count = counter.getAndIncrement();
        if (count % 1000 == 0) {
            logger.info(String.format("Clearing %s session for result %d",  type.toString(), counter.get()));   
            em.clear();
        }
        System.out.println("entity: " entity.getId());
    });
}
Cuga
  • 17,668
  • 31
  • 111
  • 166
  • We have been using this approach for quite some time, but unfortunately it seems to break after updating to Spring boot 3. Guess something must have changed in Hibernate 6.x. – Daniel Dec 19 '22 at 11:32

1 Answers1

3

You can use setFirstResult and setMaxResults to iterate over a large resultset. You can find an example in this related question.

Another issue which comes to my mind is that you might have eager fetch set by default. This means you might be getting all related entities and if they have related entities they will be fetched as well. You should switch on the sql statements in your log file to check if this is happening.

--- EDIT to answer the comment

If the objects are not being referenced in java, then you can clear the first level cache with flush and clear (in entitymanager). That should clear all the loaded objects.

julodnik
  • 359
  • 2
  • 8
  • It iterates lazily just fine. The issue is that it's not fetching data in chunks equal to the 'fetch size' and disposing of them once they've been processed. They're staying in memory until the entire result set is processed. – Cuga Oct 09 '19 at 13:23
  • Thanks! Adding a `clear()` on the entity manager did the trick! I updated the post with code to show how it worked for me, but you should get credit for the advice. – Cuga Oct 09 '19 at 17:43