6

I have an @Entity containing a few @OneToMany relationships, but since they consist of collections of Enums, I'm using @ElementCollection. The entity has an id that gets generated at the database level (MySQL).

Here is a small example I just made up that corresponds to the structure of my entity.

@Entity
public class Student {

  @Id @GeneratedValue(strategy = GenerationType.IDENTITY)
  private Integer id;

  @ElementCollection(targetClass = Language.class)
  @CollectionTable(name="student_languages", joinColumns=@JoinColumn(name="student_id"))
  private Set<Language> languages;

  @ElementCollection(targetClass = Module.class)
  @CollectionTable(name="student_modules", joinColumns=@JoinColumn(name="student_id"))
  private Set<Module> modules;

  @ElementCollection(targetClass = SeatPreference.class)
  @CollectionTable(name="student_seats", joinColumns=@JoinColumn(name="student_id"))
  private Set<SeatPreference> seatPreference;

[...]
}

I know that GenerationType.IDENTITY deactivates batching, but I thought that would be the case for the main entity only, not for the single properties too. I'm havin to bulk import a few entities (~20k), each with a handful of properties, but Hibernate seems to be generating one insert for each property in the sets, making the import impossibly slow (between 10 and 20 inserts for each record).

I have now spent so long trying to make this faster, that I'm considering just generating an SQL file that I can manually import in the database.

Is there no way to instruct Hibernate to batch inserts the @ElementCollection fields? Am I doing something wrong?

Michele Palmia
  • 2,402
  • 2
  • 25
  • 28
  • 1
    A suggestion from experience: don't bother with Hibernate for performant inserts and auto-ids, but use something more low level like [JdbcTemplate](https://docs.spring.io/spring-framework/docs/current/reference/html/data-access.html#jdbc-advanced-jdbc) or [jOOQ](https://www.jooq.org/doc/latest/manual/sql-execution/importing/). You will save yourself many wasted hours. – ax. Jan 04 '22 at 21:14
  • Did you try the following properties in your hibernate.cfg.xml file: hibernate.jdbc.batch_size=5 hibernate.order_inserts=true Other trick that you could apply, define a batch size app.properties and iterate your collection if it reaches to the batch size flush hibernate session. – Noman ali abbasi Jan 05 '22 at 08:23
  • It batches with the latest MariaDB. I'll test with MySQL shortly. – Alexey Veleshko Jan 05 '22 at 17:15

2 Answers2

2

Basically, seem hibernate will not help with @ElementCollection batching but you can use the SQL bulk inserts. Seems you are on MySQL which does support the bulk inserts and its JDBC driver can automatically modify / rewrite the individual insert statements into one bulk statement if you enable the rewriteBatchedStatements property.

So in your case what you need to do is tell hibernate to enable batching and order the batch inserts and updates.

hibernate.jdbc.batch_size=100
hibernate.order_inserts=true
hibernate.order_updates=true

This will ensure that when inserting the data into DB the inserts statements generated by Hibernate will be executed in a batch and they will be ordered.

So the SQL generated by Hibernate will be something like this:

insert into student_languages (student_id, languages) values (1,1)
insert into student_languages (student_id, languages) values (1,2)
insert into student_languages (student_id, languages) values (1,3)
insert into student_languages (student_id, languages) values (1,4)

Next, you will need to tell the JDBC driver to rewrite the individual inserts into the bulk insert by setting the rewriteBatchedStatements=true

jdbc:mysql://db:3306/stack?useSSL=false&rewriteBatchedStatements=true

So this will instruct the driver to rewrite the inserts into bulk form, so the above several SQL statements will be rewritten into something like this

insert into student_languages (student_id, languages) values (1,1),(1,2),(1,3),(1,4)

Just as an info this may not work if you are using old versions of the MySQL driver and Hibernate.

Babl
  • 7,446
  • 26
  • 37
  • ANSI SQL is an ancient standard. Batch inserts were standardized many years ago. – Alexey Veleshko Jan 05 '22 at 13:58
  • @AlexeyVeleshko it would be helpful if you can share the link to the bulk insert SQL standard, so I can update the answer. – Babl Jan 05 '22 at 14:03
  • I can't link to the SQL-92 standard itself (they are not freely distributed), but this Wikipedia link should be enough: https://en.wikipedia.org/wiki/Insert_(SQL)#Multirow_inserts – Alexey Veleshko Jan 05 '22 at 14:34
  • Batching also appears to be synonymous with bulk inserts, see https://vladmihalcea.com/the-best-way-to-do-batch-processing-with-jpa-and-hibernate/ where Hibernate is configured for batching and generates bulk inserts without any JDBC driver-specific configuration. – nickb Jan 05 '22 at 15:00
  • @nickb yep, but that does not seem to work with ```@ElementCollections```, so you need to rewrite on driver level. – Babl Jan 05 '22 at 15:08
  • Right, I think the premise behind the question is *why* doesn't batching work with `@ElementCollection` when it does for inserting top-level entities, and is there a way to get Hibernate to bulk insert `@ElementCollection`. You have found one solution, the MySQL driver configuration `rewriteBatchedStatements`. I also wonder if there is a solution that doesn't depend on which JDBC driver is in use. Nevertheless, thank you for your answer and your research. – nickb Jan 05 '22 at 15:27
1

I tested this both with MySQL and MariaDB and actually Hibernate does batch inserts into the collection table. But it's not visible to the naked eye, you have to use DataSource-Proxy to see it:

INFO  com.example.jpa.AddStudents - Adding students
DEBUG n.t.d.l.l.SLF4JQueryLoggingListener - 
Name:dataSource, Connection:3, Time:1, Success:True
Type:Prepared, Batch:False, QuerySize:1, BatchSize:0
Query:["insert into student (name) values (?)"]
Params:[(Smith)]
DEBUG n.t.d.l.l.SLF4JQueryLoggingListener - 
Name:dataSource, Connection:3, Time:1, Success:True
Type:Prepared, Batch:False, QuerySize:1, BatchSize:0
Query:["insert into student (name) values (?)"]
Params:[(Snow)]
DEBUG n.t.d.l.l.SLF4JQueryLoggingListener - 
Name:dataSource, Connection:3, Time:78, Success:True
Type:Prepared, Batch:True, QuerySize:1, BatchSize:6
Query:["insert into student_languages (student_id, language) values (?, ?)"]
Params:[(6,2),(6,0),(6,1),(7,0),(7,4),(7,3)]
INFO  com.example.jpa.AddStudents - Added

The SEQUENCE ID generator is considered the best for Hibernate. It doesn't create lock contention as the TABLE generator does and allows for batching. It is unfortunate that MySQL doesn't support sequences still (MariaDB does).

Am I doing something wrong?

Hibernate is optimized for small-scale changes in the database. It maintains a first-level cache and also supports a second-level cache which will only hinder performance for large-scale operations. Therefore, indeed, you might be better off using JDBC or jOOQ for this particular operation as was suggested in the comments.


I used MySQL 8.0.3, MariaDB 10.5.13 and Hibernate 5.6.3.Final.

Alexey Veleshko
  • 792
  • 1
  • 18