I want to insert a large number of entities into cross table. To do that, I want to enable hibernate's batch insert option, so that each insert will add not 1, but 20 or 50 rows at a time.
I'm trying to reproduce the results from hibernate's tutorial. I've created a test function that will try to insert 30 customers, just like in example:
//Batch inserting test entities
//This code is inside transaction already, so I'm not creating a new one
entityManager.flush();
for (int i = 0; i < 30; ++i) {
Customer customer = new Customer("Customer" + i, (i + 5) * 1000, "Position " + i);
entityManager.persist(customer);
if (i % 20 == 0) {
entityManager.flush();
entityManager.clear();
}
}
entityManager.flush();
entityManager.clear();
The Customer class has no generated id.
@Entity
public class Customer {
@Id
private String id;
@Column
private String name;
@Column
private long salary;
@Column
private String position;
public Customer() {
}
public Customer(String name, long salary, String position) {
this.id = UUID.randomUUID().toString();
this.name = name;
this.salary = salary;
this.position = position;
}
}
What I'm expecting to see is 2 insert statements, one for 20 records and one for 10 records. However, when I open postgres logs, I see 30 insert statements, each inserting only 1 row. I've double checked that it's possible to insert multiple rows using postgres.
I believe that the problem is caused by the hibernate.jdbc.batch_size 20
parameter, which I'm supposed to pass to hibernate. However, since I'm using spring boot, the only configuration file I have is application.properties. So I've tried inserting it there:
hibernate.jdbc.batch_size=20
hibernate.order_inserts=true
hibernate.order_updates=true
hibernate.jdbc.batch_versioned_data=true
spring.jpa.hibernate.jdbc.batch_size=20
spring.jpa.hibernate.order_inserts=true
spring.jpa.hibernate.order_updates=true
spring.jpa.hibernate.jdbc.batch_versioned_data=true
spring.jpa.properties.hibernate.jdbc.batch_size=20
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.order_updates=true
spring.jpa.properties.hibernate.jdbc.batch_versioned_data=true
According to this answer, this should be enough, but
I've also tried copying these parameters to the hibernate.properties
file inside my jar and supplied the parameter via command line: -Dhibernate.jdbc.batch_size=20
as stated in this documentation.
None of this helps.
I also can't find a way to read batch size property in my code. It seems that most objects mentioned in hibernate's documentation doesn't exist in spring boot application.
How can I enable hibernate's batch insert inside spring boot application?
I've created a minimal working example which reproduces the issue: https://github.com/Alexey-/spring-boot-batch
After receiving the answer from Vlad Mihalcea I realized that yes, in fact batch statements are working, at least on hibernate and jdbc levels.
However, postgres logs themselves show very interesting behavior: they are almost identical for the case of batch inserts and normal inserts.
What I was initially expecting to see was that hibernate would be using statements like this:
test=# INSERT INTO customer (name, position, salary, id) values ('CustomerX', 'PositionX', '1000', 'idX'), ('CUSTOMERY', 'POSITIONY', '1000', 'idY');
Which would produce log records similar to this:
2015-12-15 11:43:33.238 MSK LOG: statement: INSERT INTO customer (name, position, salary, id) values ('CustomerX', 'PositionX', '1000', 'idX'), ('CUSTOMERY', 'POSITIONY', '1000', 'idY');
However, that's not the case.
When batch insertion is enabled (p6spy shows that statements are, in fact, batched), postgres will produce logs similar to this:
2015-12-15 12:07:00.638 MSK LOG: execute <unnamed>: BEGIN
2015-12-15 12:07:00.638 MSK LOG: duration: 0.000 ms
2015-12-15 12:07:00.638 MSK LOG: duration: 0.000 ms parse <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:07:00.638 MSK LOG: duration: 0.000 ms bind <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:07:00.638 MSK DETAIL: parameters: $1 = 'Customer0', $2 = 'Position 0', $3 = '0', $4 = '9c6a86fb-c991-4e98-aa65-fa736ef67dd7'
2015-12-15 12:07:00.638 MSK LOG: execute <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:07:00.638 MSK DETAIL: parameters: $1 = 'Customer0', $2 = 'Position 0', $3 = '0', $4 = '9c6a86fb-c991-4e98-aa65-fa736ef67dd7'
2015-12-15 12:07:00.639 MSK LOG: duration: 1.000 ms
2015-12-15 12:07:00.648 MSK LOG: duration: 0.000 ms parse S_1: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:07:00.648 MSK LOG: duration: 0.000 ms bind S_1: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:07:00.648 MSK DETAIL: parameters: $1 = 'Customer1', $2 = 'Position 1', $3 = '10', $4 = 'c8b2669c-044a-4a4d-acbd-31c3bcd9a783'
2015-12-15 12:07:00.648 MSK LOG: execute S_1: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:07:00.648 MSK DETAIL: parameters: $1 = 'Customer1', $2 = 'Position 1', $3 = '10', $4 = 'c8b2669c-044a-4a4d-acbd-31c3bcd9a783'
2015-12-15 12:07:00.648 MSK LOG: duration: 0.000 ms
2015-12-15 12:07:00.648 MSK LOG: duration: 0.000 ms bind S_1: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:07:00.648 MSK DETAIL: parameters: $1 = 'Customer2', $2 = 'Position 2', $3 = '20', $4 = '1c694f41-2ce7-4ee2-a0c0-f359690506f0'
2015-12-15 12:07:00.649 MSK LOG: execute S_1: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:07:00.649 MSK DETAIL: parameters: $1 = 'Customer2', $2 = 'Position 2', $3 = '20', $4 = '1c694f41-2ce7-4ee2-a0c0-f359690506f0'
2015-12-15 12:07:00.649 MSK LOG: duration: 0.000 ms
2015-12-15 12:07:00.649 MSK LOG: duration: 0.000 ms bind S_1: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:07:00.649 MSK DETAIL: parameters: $1 = 'Customer3', $2 = 'Position 3', $3 = '30', $4 = '1815947d-2604-48d4-a6be-43f6905130cf'
2015-12-15 12:07:00.649 MSK LOG: execute S_1: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:07:00.649 MSK DETAIL: parameters: $1 = 'Customer3', $2 = 'Position 3', $3 = '30', $4 = '1815947d-2604-48d4-a6be-43f6905130cf'
2015-12-15 12:07:00.649 MSK LOG: duration: 0.000 ms
2015-12-15 12:07:00.649 MSK LOG: duration: 0.000 ms bind S_1: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:07:00.649 MSK DETAIL: parameters: $1 = 'Customer4', $2 = 'Position 4', $3 = '40', $4 = 'cc521007-820f-4d58-8e1a-16a166aa91cf'
2015-12-15 12:07:00.649 MSK LOG: execute S_1: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:07:00.649 MSK DETAIL: parameters: $1 = 'Customer4', $2 = 'Position 4', $3 = '40', $4 = 'cc521007-820f-4d58-8e1a-16a166aa91cf'
2015-12-15 12:07:00.649 MSK LOG: duration: 0.000 ms
... the rest of the logs is identical and do not provide any valuable information...
And when batch statements are disabled (p6spy shows that no batching is performed), logs will looks like this:
2015-12-15 12:09:00.246 MSK LOG: execute <unnamed>: BEGIN
2015-12-15 12:09:00.246 MSK LOG: duration: 0.000 ms
2015-12-15 12:09:00.246 MSK LOG: duration: 0.000 ms parse <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.246 MSK LOG: duration: 0.000 ms bind <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.246 MSK DETAIL: parameters: $1 = 'Customer0', $2 = 'Position 0', $3 = '0', $4 = '9e085ad0-437f-4d7d-afaa-e342e031cbee'
2015-12-15 12:09:00.246 MSK LOG: execute <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.246 MSK DETAIL: parameters: $1 = 'Customer0', $2 = 'Position 0', $3 = '0', $4 = '9e085ad0-437f-4d7d-afaa-e342e031cbee'
2015-12-15 12:09:00.246 MSK LOG: duration: 0.000 ms
2015-12-15 12:09:00.248 MSK LOG: duration: 0.000 ms parse <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.248 MSK LOG: duration: 0.000 ms bind <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.248 MSK DETAIL: parameters: $1 = 'Customer1', $2 = 'Position 1', $3 = '10', $4 = 'f29cfa40-7d24-49a6-ae5d-2a2021932d80'
2015-12-15 12:09:00.248 MSK LOG: execute <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.248 MSK DETAIL: parameters: $1 = 'Customer1', $2 = 'Position 1', $3 = '10', $4 = 'f29cfa40-7d24-49a6-ae5d-2a2021932d80'
2015-12-15 12:09:00.249 MSK LOG: duration: 1.000 ms
2015-12-15 12:09:00.250 MSK LOG: duration: 0.000 ms parse <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.250 MSK LOG: duration: 0.000 ms bind <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.250 MSK DETAIL: parameters: $1 = 'Customer2', $2 = 'Position 2', $3 = '20', $4 = '067dd6d4-5060-467f-b533-75994ecbaedc'
2015-12-15 12:09:00.250 MSK LOG: execute <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.250 MSK DETAIL: parameters: $1 = 'Customer2', $2 = 'Position 2', $3 = '20', $4 = '067dd6d4-5060-467f-b533-75994ecbaedc'
2015-12-15 12:09:00.250 MSK LOG: duration: 0.000 ms
2015-12-15 12:09:00.250 MSK LOG: duration: 0.000 ms parse <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.250 MSK LOG: duration: 0.000 ms bind <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.251 MSK DETAIL: parameters: $1 = 'Customer3', $2 = 'Position 3', $3 = '30', $4 = '7df32327-f2f5-4011-848d-55aafb3f09fa'
2015-12-15 12:09:00.251 MSK LOG: execute <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.251 MSK DETAIL: parameters: $1 = 'Customer3', $2 = 'Position 3', $3 = '30', $4 = '7df32327-f2f5-4011-848d-55aafb3f09fa'
2015-12-15 12:09:00.251 MSK LOG: duration: 0.000 ms
2015-12-15 12:09:00.251 MSK LOG: duration: 0.000 ms parse S_1: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.251 MSK LOG: duration: 0.000 ms bind S_1: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.251 MSK DETAIL: parameters: $1 = 'Customer4', $2 = 'Position 4', $3 = '40', $4 = '1e55ab6a-8780-4c8f-8af2-2886d954f819'
2015-12-15 12:09:00.251 MSK LOG: execute S_1: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.251 MSK DETAIL: parameters: $1 = 'Customer4', $2 = 'Position 4', $3 = '40', $4 = '1e55ab6a-8780-4c8f-8af2-2886d954f819'
2015-12-15 12:09:00.251 MSK LOG: duration: 0.000 ms
... the rest of the logs is identical and do not provide any valuable information...
So, the only difference between the two is when batch is disabled, postgres will take tiny bit more time to realize that it should reuse prepared statements.
So I decided to run performance tests to confirm this.
I tried inserting 30.000 records into empty database.
With batching disabled, it took 334ms.
With batching enabled, it took 4650ms!
So then I removed all calls to entityManager.flush (leaving only entityManager.clear) and the time dropped to 320ms. I don't know why hibernate's tutorial advises using flush. I guess that's just a mistake.
So I guess the bottom line is this: the batching is working, but it doesn't provide any real benefits (at least for postgres). Furthermore, using it improperly (as in tutorial) can lead to a terrible, terrible results. Use at your own risk and always measure real performance increase.