28

I'm trying to understand why saveAll has better performance than save in the Spring Data repositories. I'm using CrudRepository which can be seen here.

To test I created and added 10k entities, which just have an id and a random string (for the benchmark I kept the string a constant), to a list. Iterating over my list and calling .save on each element, it took 40 seconds. Calling .saveAll on the same entire list completed in 2 seconds. Calling .saveAll with even 30k elements took 4 seconds. I made sure to truncate my table before performing each test. Even batching the .saveAll calls to sublists of 50 took 10 seconds with 30k.

The simple .saveAll with the entire list seems to be the fastest.

I tried to browse the Spring Data source code but this is the only thing I found of value. Here it seems .saveAll simply iterates over the entire Iterable and calls .save on each one like I was doing. So how is it that much faster? Is it doing some transactional batching internally?

Yottabyte
  • 433
  • 1
  • 6
  • 10

1 Answers1

45

Without having your code, I have to guess, I believe it has to do with the overhead of creating new transaction for each object saved in the case of save versus opening one transaction in the case of saveAll.

Notice the definition of save and saveAll they are both annotated with @Transactional. If your project is configured properly, which seems to be the case since entities are being saved to the database, that means a transaction will be created whenever one of these methods are called. if you are calling save in a loop that means a new transaction is being created each time you call save, but in the case of saveAll there is one call and therefor one transaction created regardless of the number of entities being saved.

I'm assuming that the test is not itself being run within a transaction, if it were to be run within a transaction then all calls to save will run within that transaction since the the default transaction propagation is Propagation.REQUIRED, that means if there is a transaction already open the calls will be run within it. If your planning to use spring data I strongly recommend that you read about transaction management in Spring.

Sofia Paixão
  • 309
  • 2
  • 16
Yazan Jaber
  • 2,068
  • 25
  • 36
  • 2
    This is not fully accurate. @Transactional default propagation level is Required, which means that a transaction should exist before the actual call of this method. CrudRepository will not create a transaction by itself, because of that propagation level. – edward_wong Dec 18 '19 at 13:13
  • 5
    @edward_wong Who actually creates the transaction is irrelevant to his question and I didn't say it was CrudRepository who created the transaction. Still mentioning the default propagation is important because unlike Propagation.REQUIRE_NEW which always require the creation of a new transaction, Propagation.REQUIRED will continue using a transaction created in the outer scope (among other semantics that are also irrelevant to his question) which explains the performance difference between calling save() directly in his own loop or calling saveAll(). – Yazan Jaber Dec 24 '19 at 00:52
  • @edward_wong "Propagation REQUIRED: Support a current transaction, create a new one if none exists. Analogous to EJB transaction attribute of the same name." So it will create new, by default. https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/transaction/annotation/Propagation.html#REQUIRED – Misiek Mar 26 '20 at 10:38
  • Does that mean, if the service method is already annotated with transaction, then inside that if you call save() multiple timesor saveAll() at once , both will have same performance right? – A MJ Mar 22 '23 at 17:39