2

I have an application that uses Hibernate and it's running out of memory with a medium volume dataset (~3 million records). When analysing the memory dump using Eclipse's Memory Analyser I can see that StatefulPersistenceContext appears to be holding a copy of the record in memory in addition to the object itself, doubling the memory usage.

I'm able to reproduce this on a slightly smaller scale with a defined workflow, but am unable to simplify it to the level that I can put the full application here. The workflow is:

  1. Insert ~400,000 records (Fruit) into the database from a file
  2. Get all of the Fruits from the database and find if there are any complementary items to create ~150,000 Baskets (containing two Fruits)
  3. Retrieve all of the data - Fruits & Baskets - and save to a file

It's running out of memory at the final stage, and the heap dump shows StatefulPersistenceContext has hundreds of thousands of Fruits in memory, in addition to the Fruits we retrieved to save to the file.

I've looked around online and the suggestion appears to be to use QueryHints.READ_ONLY on the query (I put it on the getAll), or to wrap it in a Transaction with the readOnly property set - but neither of these seem to have stopped the massive StatefulPersistenceContext.

Is there something else I should be looking at?

Examples of the classes / queries I'm using:

public interface ShoppingService {
    public void createBaskets();

    public void loadFromFile(ObjectInput input);

    public void saveToFile(ObjectOutput output);
}
@Service
public class ShoppingServiceImpl implements ShoppingService {
    @Autowired
    private FruitDAO fDAO;

    @Autowired
    private BasketDAO bDAO;

    @Override
    public void createBaskets() {
        bDAO.add(Basket.generate(fDAO.getAll()));
    }

    @Override
    public void loadFromFile(ObjectInput input) {
        SavedState state = ((SavedState) input.readObject());

        fDAO.add(state.getFruits());
        bDAO.add(state.getBaskets());
    }

    @Override
    public void saveToFile(ObjectOutput output) {
        output.writeObject(new SavedState(fDAO.getAll(), bDAO.getAll()));
    }

    public static void main(String[] args) throws Throwable {
        ShoppingService service = null;

        try (ObjectInput input = new ObjectInputStream(new FileInputStream("path\\to\\input\\file"))) {
            service.loadFromFile(input);
        }

        service.createBaskets();

        try (ObjectOutput output = new ObjectOutputStream(new FileOutputStream("path\\to\\output\\file"))) {
            service.saveToFile(output);
        }
    }
}
@Entity
public class Fruit {
    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE)
    private Long id;

    private String name;

    // ~ 200 string fields
}
public interface FruitDAO {
    public void add(Collection<Fruit> elements);

    public List<Fruit> getAll();
}
@Repository
public class JPAFruitDAO implements FruitDAO {
    @PersistenceContext
    private EntityManager em;

    @Override
    @Transactional()
    public void add(Collection<Fruit> elements) {
    elements.forEach(em::persist);
    }

    @Override
    public List<Fruit> getAll() {
    return em.createQuery("FROM Fruit", Fruit.class).getResultList();
    }
}
@Entity
public class Basket {
    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE)
    private Long id;

    @OneToOne
    @JoinColumn(name = "arow")
    private Fruit aRow;

    @OneToOne
    @JoinColumn(name = "brow")
    private Fruit bRow;

    public static Collection<Basket> generate(List<Fruit> fruits) {
    // Some complicated business logic that does things
    return null;
    }
}
public interface BasketDAO {
    public void add(Collection<Basket> elements);

    public List<Basket> getAll();
}
@Repository
public class JPABasketDAO implements BasketDAO {
    @PersistenceContext
    private EntityManager em;

    @Override
    @Transactional()
    public void add(Collection<Basket> elements) {
    elements.forEach(em::persist);
    }

    @Override
    public List<Basket> getAll() {
    return em.createQuery("FROM Basket", Basket.class).getResultList();
    }
}
public class SavedState {
    private Collection<Fruit> fruits;
    private Collection<Basket> baskets;
}
Jakg
  • 922
  • 12
  • 39
  • 2
    Are you sure you're using `QueryHints.READ_ONLY` correctly? The context shouldn't hold 2 copies of each entity, only one. – Olivier Nov 12 '22 at 09:12
  • 1
    @Olivier I've updated the question to make it clearer what I tried - I thought I did it right, but it didn't work, so... To be clear, It's not holding two copies internally - the `StatefulPersistenceContext` is holding one copy in addition to the object it already gave me. – Jakg Nov 12 '22 at 13:46
  • 1
    *"in addition to the object it already gave me"* It should be the same instance, not a copy. – Olivier Nov 12 '22 at 14:00
  • 1
    @Olivier Eclipse Memory Analyser is listing them as two seperate object with their own size, so that doesn't appear to be the case. – Jakg Nov 12 '22 at 15:50
  • 1
    Storing a huge amount of data in a file with one root object does not allow to read/write parts by parts. I would store the number of fruits first, then the list of fruits. Idem for baskets. Try also to use paginated queries and serialize the results page by page in the file (after having counted them). If your paginated queries return Jpa entities, they will stay in the persistence context and in the end all your fruits and baskets will be in memory. To avoid that, you can restart the hibernate entitymanager, or you can make queries that return DTO instead of entities. – Pierre Demeestere Nov 13 '22 at 16:17

3 Answers3

1

Have a look at this answer here... How does Hibernate detect dirty state of an entity object?

Without access to the heap dump or your complete code, I would believe that you are seeing exactly what you are saying that you see. As long as hibernate believes that it is possible that the entities will change, it keeps a complete copy in memory so that it can compare the current state of the object to the state as it was originally loaded from the database. Then at the end of the transaction (the transactional block of code), it will automatically write the changes to the database. In order to do this, it needs to know what the state of the object used to be in order to avoid a large number of (potentially expensive) write operations.

I believe that setting the transaction-block so that it is read-only is a step on the right-track. Not completely sure, but I hope the information here helps you at least understand why you are seeing large memory consumption.

Nathan
  • 1,576
  • 8
  • 18
  • Just to be clear - your saying that you believe that, if I've tried the read-only stuff and it's not addressed it, this is just how Hibernate works? – Jakg Nov 16 '22 at 11:16
  • In the example that you posted, you have only 2 Transactional annotations. One is when you are persisting new Basket objects and the other is where you are persisting new Fruit objects. That means you have one of the following... 1) implicit transaction handling where we don't see the configuration or 2) Transaction annotations in other places that aren't included in your example. This means that it is difficult to assess where the problem might be. My expectation would be that if correctly configured, then the memory footprint would be reduced. – Nathan Nov 17 '22 at 12:21
  • See the question / answer / discussion here: https://stackoverflow.com/questions/49955727/hibernate-read-only-entities-says-saves-memory-by-deleting-database-snapshots – Nathan Nov 17 '22 at 12:24
1

1: Fetching all Fruits at once from DB, or Persisting large set of bucket once will impact DB performance as well as application performance because of huge objects in Heap memory (young gen + Old gen based on Object survive in heap). Use batch process instead of processing all data once. use spring batch or implement or a custom logic to process data in set of chunks.

2: The persistence context stores newly created and modified entities in memory. Hibernate sends these changes to the database when the transaction is synchronized. This generally happens at the end of a transaction. However, calling EntityManager.flush() also triggers a transaction synchronization. Secondly, the persistence context serves as an entity cache, also referred to as the first level cache. To clear entities in the persistence context, we can call EntityManager.clear().

Can take ref for batch processing from here.

3.If you don't plan on modifying Fruit, you could just fetch entries in read-only mode: Hibernate will not retain the dehydrated state which it normally uses for the dirty checking mechanism. So, you get half the memory footprint.

Bhushan Uniyal
  • 5,575
  • 2
  • 22
  • 45
1

Quick Solution: If you just execute this method one time for db create increase jvm -Xmx value.

Real Solution: When you try to persist everything it will keep all datas in memory until commit, and memory easily consume, so rather than this, try to save datas part part like this dump modes. For example:

EntityManager em = ...;
for (Fruid fruid : fruids) {
    try {
        em.getTransaction().begin();
        em.persist(fruid);
        em.getTransaction().commit();
    } finally {
        if (em.getTransaction().isActive()) {
            em.getTransaction().rollback();
        }
        if (em.isOpen())
            em.close();
    }
}
utrucceh
  • 1,076
  • 6
  • 11