Definitive way to do efficient batch/bulk inserts with JPA and Hibernate?

Question

While there are other similar questions here, I didn't see any that address all the issues or have a good definitive answer. Essentially I am working on architecting data access and service layers in a J2EE app using Hibernate backed JPA.

The application does large data loading/update operations, and I want to make sure these get into the database as efficiently as possible. Correct answer will explain, with code, how to consume a large collection of data for insert, update, and ideally merge

Set a batch size in configuration or in code that is respected, and how to code to that using JPA ( without using raw Hibernate if possible)
how and when to use the JPA transactional commands / annotations to ensure efficient memory / first / second level cache use.
Explain to me what this means 'Hibernate disables insert batching at the JDBC level transparently if you use an identity identifier generator' - is this related to using a sequence for primary identifiers?
any gotchas I should know about.

Note I've asked some related questions about Hibernate and J2EE/JPA and if you have anything to add to these please do

Both are relatively new technologies to me (see my other questions):

How should EntityManager be used in a nicely decoupled service layer and data access layer?

and

Should raw Hibernate annotated POJO's be returned from the Data Access Layer, or Interfaces instead?

score 4 · Answer 1 · edited Jul 19 '17 at 10:36

I can explain the statement about Hibernate disabling Batch Inserts when an identity generator is used.

In order for Hibernate to obtain an identifier for a new entity using an identity generator it must actually perform the insert to the database and then perform a select to obtain that identifier value since the value is assigned by the database on Insert. This is in contrast to using a sequence generator. In this case Hibernate can obtain as many identifiers up front (in batches if requires) and assign them to the entities it is inserting before they are inserted.

So the difference is Insert then Select for identity generator vs Select then Insert for a sequence.

Therefore Hibernate must do the Inserts one by one when an identity generator is used but can batch them up when a sequence generator is used.

Definitive way to do efficient batch/bulk inserts with JPA and Hibernate?

1 Answers1