106

When calling the saveAll method of my JpaRepository with a long List<Entity> from the service layer, trace logging of Hibernate shows single SQL statements being issued per entity.

Can I force it to do a bulk insert (i.e. multi-row) without needing to manually fiddle with EntityManger, transactions etc. or even raw SQL statement strings?

With multi-row insert I mean not just transitioning from:

start transaction
INSERT INTO table VALUES (1, 2)
end transaction
start transaction
INSERT INTO table VALUES (3, 4)
end transaction
start transaction
INSERT INTO table VALUES (5, 6)
end transaction

to:

start transaction
INSERT INTO table VALUES (1, 2)
INSERT INTO table VALUES (3, 4)
INSERT INTO table VALUES (5, 6)
end transaction

but instead to:

start transaction
INSERT INTO table VALUES (1, 2), (3, 4), (5, 6)
end transaction

In PROD I'm using CockroachDB, and the difference in performance is significant.

Below is a minimal example that reproduces the problem (H2 for simplicity).


./src/main/kotlin/ThingService.kt:

package things

import org.springframework.boot.autoconfigure.SpringBootApplication
import org.springframework.boot.runApplication
import org.springframework.web.bind.annotation.RestController
import org.springframework.web.bind.annotation.GetMapping
import org.springframework.data.jpa.repository.JpaRepository
import javax.persistence.Entity
import javax.persistence.Id
import javax.persistence.GeneratedValue

interface ThingRepository : JpaRepository<Thing, Long> {
}

@RestController
class ThingController(private val repository: ThingRepository) {
    @GetMapping("/test_trigger")
    fun trigger() {
        val things: MutableList<Thing> = mutableListOf()
        for (i in 3000..3013) {
            things.add(Thing(i))
        }
        repository.saveAll(things)
    }
}

@Entity
data class Thing (
    var value: Int,
    @Id
    @GeneratedValue
    var id: Long = -1
)

@SpringBootApplication
class Application {
}

fun main(args: Array<String>) {
    runApplication<Application>(*args)
}

./src/main/resources/application.properties:

jdbc.driverClassName = org.h2.Driver
jdbc.url = jdbc:h2:mem:db
jdbc.username = sa
jdbc.password = sa

hibernate.dialect=org.hibernate.dialect.H2Dialect
hibernate.hbm2ddl.auto=create

spring.jpa.generate-ddl = true
spring.jpa.show-sql = true

spring.jpa.properties.hibernate.jdbc.batch_size = 10
spring.jpa.properties.hibernate.order_inserts = true
spring.jpa.properties.hibernate.order_updates = true
spring.jpa.properties.hibernate.jdbc.batch_versioned_data = true

./build.gradle.kts:

import org.jetbrains.kotlin.gradle.tasks.KotlinCompile

plugins {
    val kotlinVersion = "1.2.30"
    id("org.springframework.boot") version "2.0.2.RELEASE"
    id("org.jetbrains.kotlin.jvm") version kotlinVersion
    id("org.jetbrains.kotlin.plugin.spring") version kotlinVersion
    id("org.jetbrains.kotlin.plugin.jpa") version kotlinVersion
    id("io.spring.dependency-management") version "1.0.5.RELEASE"
}

version = "1.0.0-SNAPSHOT"

tasks.withType<KotlinCompile> {
    kotlinOptions {
        jvmTarget = "1.8"
        freeCompilerArgs = listOf("-Xjsr305=strict")
    }
}

repositories {
    mavenCentral()
}

dependencies {
    compile("org.springframework.boot:spring-boot-starter-web")
    compile("org.springframework.boot:spring-boot-starter-data-jpa")
    compile("org.jetbrains.kotlin:kotlin-stdlib-jdk8")
    compile("org.jetbrains.kotlin:kotlin-reflect")
    compile("org.hibernate:hibernate-core")
    compile("com.h2database:h2")
}

Run:

./gradlew bootRun

Trigger DB INSERTs:

curl http://localhost:8080/test_trigger

Log output:

Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Tobias Hermann
  • 9,936
  • 6
  • 61
  • 134
  • Please check my answer, hope it will be helpful: https://stackoverflow.com/a/50694902/5380322 – Cepr0 Jun 09 '18 at 08:26
  • @Cepr0 Thanks, but I'm already doing this (accumulating in a list and calling `saveAll`. I just added a minimal code example to reproduce the problem. – Tobias Hermann Jun 09 '18 at 17:15
  • Did you set `hibernate.jdbc.batch_size` property? – Cepr0 Jun 09 '18 at 18:49
  • @Cepr0 Yes. (see above) – Tobias Hermann Jun 09 '18 at 19:23
  • 3
    It's incorrect, it must be in this form: `spring.jpa.properties.hibernate.jdbc.batch_size` – Cepr0 Jun 09 '18 at 20:01
  • @Cepr0 Thanks, rieckpil already [mentioned](https://stackoverflow.com/questions/50772230/how-to-do-bulk-inserts-with-jparepository?noredirect=1#comment88562812_50772262) that and I adjusted my code accordingly. However it is still not batching. – Tobias Hermann Jun 09 '18 at 20:05
  • What you are showing is batch insert. Bulk insert is a much faster technique, but it is specific to the database and it is not supported by JPA. – Razvan P Dec 09 '21 at 20:47

6 Answers6

158

To get a bulk insert with Spring Boot and Spring Data JPA you need only two things:

  1. set the option spring.jpa.properties.hibernate.jdbc.batch_size to appropriate value you need (for example: 20).

  2. use saveAll() method of your repo with the list of entities prepared for inserting.

Working example is here.

Regarding the transformation of the insert statement into something like this:

INSERT INTO table VALUES (1, 2), (3, 4), (5, 6)

the such is available in PostgreSQL: you can set the option reWriteBatchedInserts to true in jdbc connection string:

jdbc:postgresql://localhost:5432/db?reWriteBatchedInserts=true

then jdbc driver will do this transformation.

Additional info about batching you can find here.

UPDATED

Demo project in Kotlin: sb-kotlin-batch-insert-demo

UPDATED

Hibernate disables insert batching at the JDBC level transparently if you use an IDENTITY identifier generator.

Eugene
  • 117,005
  • 15
  • 201
  • 306
Cepr0
  • 28,144
  • 8
  • 75
  • 101
  • Thanks. I'm trying to get your Kotlin demo running, but did not yet succeed. I do `git clone https://github.com/Cepr0/sb-kotlin-batch-insert-demo`, `cd sb-kotlin-batch-insert-demo` and `mvn package` but then end up with the following error: https://gist.github.com/Dobiasd/7f1163110b52876f171d43e17af0853c – Tobias Hermann Jun 18 '18 at 12:02
  • @Cepr0, I just tried your program with mySql db but it does not work as expected. Is there anything to do with driver. Here is a property that I am using, ``` spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver spring.jpa.properties.hibernate.dialect = org.hibernate.dialect.MySQL5Dialect ``` – Shaunak Patel Dec 02 '18 at 14:11
  • @ShaunakPatel What exactly doesn't work and in which program, java or kotlin? – Cepr0 Dec 02 '18 at 14:37
  • @Cepr0 in Java. I see only one difference compared to your program. 1) Database (I am using MySQL). Mean, I am running your code against MySQL – Shaunak Patel Dec 02 '18 at 14:59
  • Is there one way to intercept or listen to list of saveAll(List..) method? – – Jonathan JOhx Dec 17 '18 at 19:11
  • What to use if I have no saveAll()? Spring Boot 1.5.1.RELEASE. – Woland Jan 17 '19 at 09:05
  • Batching with strong and JPA https://medium.com/@clydecroix/batching-database-writes-in-spring-479bee626fbf?sk=8ee224e83a830a6cce92fa4e3e76967e – Clyde D'Cruz Mar 24 '20 at 18:07
  • Is it possible to also use a ON CONFLICT (I'll like to ignore them in my situation). My constraint is set for a unique fields combination – Vincent Jul 20 '20 at 08:32
  • Thanks this worked.. Was trying for few hours..!!! – sinsuren Feb 23 '22 at 03:02
  • @Cepr0 `Regarding the transformation of the insert statement into something like this`... If I understand correctly, I can't get the benefit of batching my insert/update statements if I'm not using PostgreSQL with reWriteBatchedInserts=true. Is this statement correct? Is there another form of batching that doesn't include writing the values in a comma-delimited fashion? – KidCrippler Jun 18 '22 at 21:50
  • Is reWriteBatchedInserts available for SQL server? – Mark Sep 19 '22 at 16:40
17

The underlying issues is the following code in SimpleJpaRepository:

@Transactional
public <S extends T> S save(S entity) {
    if (entityInformation.isNew(entity)) {
        em.persist(entity);
        return entity;
    } else {
        return em.merge(entity);
    }
}

In addition to the batch size property settings, you have to make sure that the class SimpleJpaRepository calls persist and not merge. There are a few approaches to resolve this: use an @Id generator that does not query sequence, like

@Id
@GeneratedValue(generator = "uuid2")
@GenericGenerator(name = "uuid2", strategy = "uuid2")
var id: Long

Or forcing the persistence to treat the records as new by having your entity implement Persistable and overriding the isNew() call

@Entity
class Thing implements Pesistable<Long> {
    var value: Int,
    @Id
    @GeneratedValue
    var id: Long = -1
    @Transient
    private boolean isNew = true;
    @PostPersist
    @PostLoad
    void markNotNew() {
        this.isNew = false;
    }
    @Override
    boolean isNew() {
        return isNew;
    }
}

Or override the save(List) and use the entity manager to call persist()

@Repository
public class ThingRepository extends SimpleJpaRepository<Thing, Long> {
    private EntityManager entityManager;
    public ThingRepository(EntityManager entityManager) {
        super(Thing.class, entityManager);
        this.entityManager=entityManager;
    }

    @Transactional
    public List<Thing> save(List<Thing> things) {
        things.forEach(thing -> entityManager.persist(thing));
        return things;
    }
}

The above code is based on the following links:

Jean Marois
  • 1,510
  • 11
  • 19
  • 1
    Thanks Jean for sharing useful links. But there is still an issue with persisting the `@Generated` `@Id` values using the `Persistable` method. The batch is only executed when I manually set the `id` field by my own logic. If I rely on `@Generated` for my `Long` `id` property, then the statements do no run in batches. All the links shared by you don't use `@Generated` type strategy with `Persistable` method. I even checked the Github code link that is provided in the 2nd link, but it also is assigning the `id` property manually. – iamharish15 Jul 29 '18 at 05:49
  • I think this reply wasn't really understood (and appreciated enough). I found out the same problem with saveAll myself. So to rephrase the issue: if you HAVE working batching, your entity does NOT use generated ID, and you use SimpleJpaRepository with saveAll, then: 1.saveAll will use save in loop 2. save will call entityInformation.isNew(entity) getting response false for every call. 3. will call merge for every entity. 4. IIUC these merges calls select first, and those cannot be batched so this will create N+1 issue, due to incorrect saveAll implementation. – Martin Mucha Dec 13 '19 at 09:48
  • 2
    Batching with spring and JPA https://medium.com/@clydecroix/batching-database-writes-in-spring-479bee626fbf?sk=8ee224e83a830a6cce92fa4e3e76967e – Clyde D'Cruz Mar 24 '20 at 18:08
8

You can configure Hibernate to do bulk DML. Have a look at Spring Data JPA - concurrent Bulk inserts/updates. I think section 2 of the answer could solve your problem:

Enable the batching of DML statements Enabling the batching support would result in less number of round trips to the database to insert/update the same number of records.

Quoting from batch INSERT and UPDATE statements:

hibernate.jdbc.batch_size = 50

hibernate.order_inserts = true

hibernate.order_updates = true

hibernate.jdbc.batch_versioned_data = true

UPDATE: You have to set the hibernate properties differently in your application.properties file. They are under the namespace: spring.jpa.properties.*. An example could look like the following:

spring.jpa.properties.hibernate.jdbc.batch_size = 50
spring.jpa.properties.hibernate.order_inserts = true
....
Community
  • 1
  • 1
rieckpil
  • 10,470
  • 3
  • 32
  • 56
  • Thanks for the suggestion. I tried it out, but it did not work. I've added a minimal code example to my question to reproduce the problem, even with your settings. – Tobias Hermann Jun 09 '18 at 17:13
  • Thanks, I adjusted my configuration (and updated my question accordingly), but still no luck. – Tobias Hermann Jun 09 '18 at 20:00
  • have you tried it with a different database or is your H2 a requirement? @TobiasHermann I would suggest trying it with a MySQL database next. Not all database driver implement the JDBC batch insert/update properly – rieckpil Jun 12 '18 at 04:39
  • I tried using CockroachDB 2.0.2. It supports multi-row inserts and is about 10 times faster when I manually create the needed `java.sql.PreparedStatement` in my application and send it out using the raw `java.sql.Connection` of the `javax.sql.DataSource`. – Tobias Hermann Jun 12 '18 at 14:55
  • What this mean spring.jpa.properties.hibernate.order_inserts? – java dev Sep 04 '21 at 11:32
3

All mentioned methods work but will be slow especially if the source for inserted data lies in some other table. Firstly, even with batch_size>1 the insert operation will be executed in multiple SQL queries. Secondly, if the source data lies in the other table you need to fetch the data with other queries (and in the worst case scenario load all data into memory), and convert it to static bulk inserts. Thirdly, with separate persist() call for each entity (even if batch is enabled) you will bloat entity manager first level cache with all these entity instances.

But there's another option for Hibernate. If you use Hibernate as a JPA provider you can fallback to HQL which supports bulk inserts natively with subselect from another table. The example:

Session session = entityManager.unwrap(Session::class.java)
session.createQuery("insert into Entity (field1, field2) select [...] from [...]")
  .executeUpdate();

Whether this will work depends on your ID generation strategy. If the Entity.id is generated by the database (for example MySQL auto increment), it will be executed successfully. If the Entity.id is generated by your code (especially true for UUID generators), it will fail with "unsupported id generation method" exception.

However, in the latter scenario this problem can be overcome by custom SQL function. For example in PostgreSQL I use uuid-ossp extension which provides uuid_generate_v4() function, which I finally register in my custom dialog:

import org.hibernate.dialect.PostgreSQL10Dialect;
import org.hibernate.dialect.function.StandardSQLFunction;
import org.hibernate.type.PostgresUUIDType;

public class MyPostgresDialect extends PostgreSQL10Dialect {

    public MyPostgresDialect() {
        registerFunction( "uuid_generate_v4", 
            new StandardSQLFunction("uuid_generate_v4", PostgresUUIDType.INSTANCE));
    }
}

And then I register this class as a hibernate dialog:

hibernate.dialect=MyPostgresDialect

Finally I can use this function in bulk insert query:

SessionImpl session = entityManager.unwrap(Session::class.java);
session.createQuery("insert into Entity (id, field1, field2) "+
  "select uuid_generate_v4(), [...] from [...]")
  .executeUpdate();

The most important is the underlying SQL generated by Hibernate to accomplish this operation and it's just a single query:

insert into entity ( id, [...] ) select uuid_generate_v4(), [...] from [...]
Lukasz Frankowski
  • 2,955
  • 1
  • 31
  • 32
1

I faced the same issue but I couldn't see my hibernate queries in batch, I realized that query doesn't translate to what was really querying. But to be sure that is bulking you can enable to generate statistics spring.jpa.properties.hibernate.generate_statistics=true then you will see:

enter image description here

when you add the spring.jpa.properties.hibernate.jdbc.batch_size=100 you will start to see some differences, like less jdbc statements and more jdbc batches:

enter image description here

Guilherme Alencar
  • 1,243
  • 12
  • 21
  • I've spent quite some time figuring out where my configuration issue is before checking the hibernate statistics when first running into this issue. Thanks for letting me know that I was not the only one not going beyond hibernate initial logs. – void Mar 02 '23 at 14:39
  • I have been there @void, thanks for the feedback – Guilherme Alencar Mar 03 '23 at 15:30
0

Hibernate uses transaction-write-behind strategy to automatically perform batch insert, update or delete.

But only setting the property spring.jpa.properties.hibernate.jdbc.batch_size=100 will not work alone. we also have to set the ID generator as @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "seq_generator")

If we use GenerationType.AUTO or GenerationType.IDENTITY on entity in this case batch insert, update will not work. Because in this case hibernate don't know about the Id value that to be inserted, because it is generated on DB level, so it disable the batch insertion and individual insert takes place.

So for using batch insert, update our entity should have Id generator as Sequence.