1

I have a batch process which is recalculating data for a set of entities. The entities list is fetched from DB by hibernate:

@Transactional(propagation = Propagation.REQUIRES_NEW)
public void recalculateUserData(Long userId){
    List<Entity> data = repository.getAllPendingRecalculation(userId);

    List<Entity> recalculated = new LinkedList();

    for (Entity entity : data){
        recalculateEntity(entity, recalculated);
        recalculated.add(entity);
        flushIfNeeded(recalculated); //every 10 records
    }
}

private void recalculateEntity(Entity entity, List<Entity> recalculated){
    //do logic
}

private void flush(){
    getSession().flush();
    getSession().clear();
}

private void flushIfNeeded(List<Entity> data) {
    int flushSize = 10
    if (data.size() % flushSize == 0){
        flush();
    }
}

When the process runs it looks like some entities are becoming detached, causing two symptoms:

  1. When trying to fetch lazy data I get an exception: org.hibernate.LazyInitializationException - could not initialize proxy - no Session.
  2. When no lazy load is needed - only the first 10 records are updated in DB, even though flushIfNeeded(...) is working ok.

On my first try, I tried to resolve it by calling session#refresh(entity) inside recalculateEntity(...) - this solved the lazy initialization issue, but the issue in #2 still occurred:

private void recalculateEntity(Entity entity){
    getSession().refresh(entity);
    //do logic
}

Since this haven't solved the issue I thought about using attach(entity) instead of refresh(entity):

private void recalculateEntity(Entity entity){
    getSession().attach(entity);
    //do logic
}

This seems to work, but my question is: Why did these entities get detached in the first place?

(I'm using Hibernate 3.6.10)


Update

As @galovics explained:

The problem is that you are clearing the whole session which holds all your managed entities, making them detached.

Hibernate batch processing documentation indicates that batch updates should be performed using ScrollableResults (which resolves these issues), but in this case I have to fetch all the data before processing it, as an entity calculation might depend on entities that were already calcualted. For example, calculating Entity#3 might require data calculated for Entity#1 & Entity#2.

For a case like this, would it be better to use Session#attach(entity) (as shown in the code), using Session#flush() without using Session#clear() or is there a better solution?

Tomer A
  • 453
  • 3
  • 14

1 Answers1

2

The problem is that you are clearing the whole session which holds all your managed entities, making them detached.

If you are working with just part of the data, make sure you only fetch them, and then you can easily clear the whole session and fetch the next batch and do the same calculation.

Article on LazyInitializationException just to clarify it.

Arnold Galovics
  • 3,246
  • 3
  • 22
  • 33
  • Thanks @galovics, makes a lot of sense. [Hibernate's documentation](https://docs.jboss.org/hibernate/orm/3.6/reference/en-US/html/batch.html#batch-update) indicates that this can be resolved by using a scrollable result set (so only the updated entities are cleared from session cache). The problem is that my logic must pre-fetch all the data before the calculation process is performed as some calculations depend on other pre-fetched entities. In a case like this, what do you think would be better - using `session.attach(entity)` or using `session.flush()` without `session.clear`? – Tomer A May 09 '17 at 08:57
  • You can use `Session#evict` to remove the entity from the persistence context. If you are using EntityManager, you can go with the `detach` method. – Arnold Galovics May 09 '17 at 09:09
  • The logic is a bit more complex as a calculation of Entity3 might need to use data in Entity2 & Entity1, and this might require lazy load. If I evict or detach Entity2 or Entity1, the calculation of Entity3 would fail. – Tomer A May 09 '17 at 09:17
  • Please read my article, you want to use `JOIN FETCH` for this case. – Arnold Galovics May 09 '17 at 12:19
  • I read your article and it's a good one, but I'm not sure I can use join fetch on all relations due to system limitations. I simplified the example for the purpose of presenting the question, but the actual `recalculateEntity` logic is not as simple. The needed data for each type of entity varies from type to type. For that reason, I prefer to not eagerly load all the data. I am trying to use `JOIN FETCH` and eager load where I can. – Tomer A May 09 '17 at 13:13
  • If this is very use-case specific, then create multiple methods for each of the use cases which fetches all the data required for that particular case. – Arnold Galovics May 09 '17 at 13:20
  • Basically that is what happens inside the recalculation logic. Unfortunately, I can't calculate each use case separately as there are dependencies between use cases. I was discussing this issue with my development team, and it seems like the best approach to our use case is to suppress the session clear calls, while knowing it might require use of extra memory by objects in the session cache. – Tomer A May 09 '17 at 13:31
  • 1
    If you are fine with the additional memory footprint and the CPU time waste, then you can go without clearing the session. :-) – Arnold Galovics May 09 '17 at 14:09