38

What is the difference between delete(...) and deleteInBatch(...) methods in JpaRepostory in Spring ? The second one "deletes items in one SQL statement", but what does it mean from the application/database perspective ? Why exists two different methods with the similar results and when it is better to use one or other ?

EDIT: The same applies also for deleteAll() and deleteAllInBatch() ...

mpm
  • 3,534
  • 23
  • 33
kulatamicuda
  • 1,603
  • 2
  • 21
  • 40

4 Answers4

34

The answers here are not complete!

First off, let's check the documentation!

void deleteInBatch(Iterable<T> entities)
Deletes the given entities in a batch which means it will create a single Query.

So the "delete[All]InBatch" methods will use a JPA Batch delete, like "DELETE FROM table [WHERE ...]". That may be WAY more efficient, but it has some caveats:

  • This will not call any JPA/Hibernate lifecycle hooks you might have (@PreDelete)
  • It will NOT CASCADE to other entities
  • You have to clear your persistence context or just assume it is invalidated.

That's because JPA will issue a bulk DELETE statement to the database, bypassing the cache etc. and thus can't know which entities were affected.

See Hibernate Docs

The actual code in Spring Data JPA

And while I can't find a specific article here, I'd recommend everything Vlad Mihalcea has written, to gain a deeper understanding of JPA.

TLDR: The "inBatch" methods use bulk DELETE statements, which can be drastically faster, but have some caveats bc. they bypass the JPA cache. You should really know how they work and when to use them to benefit.

Benjamin Maurer
  • 3,602
  • 5
  • 28
  • 49
19

The delete method is going to delete your entity in one operation. The deleteInBatch is going to batch several delete-statements and delete them as 1 operation.

If you need a lot of delete operations the batch-deletion might be faster.

Kurt Du Bois
  • 7,550
  • 4
  • 25
  • 33
  • 5
    Ok, I understand this. But it still not answer why exists two methods ? Everyone wants to performs everything faster and faster these days so why not to use eg. deleteAllInBatch implementation as default implementation of deleteAll and KISS the whole JPARepository API. – kulatamicuda Oct 01 '14 at 13:23
  • 6
    Because batch processing has a performance impact. It is only faster than individual queries from a certain number of operations. – Kurt Du Bois Oct 01 '14 at 13:30
  • That explains all. Thanks. – kulatamicuda Oct 01 '14 at 13:32
  • 6
    delete removes one item, deleteAll removes all items, one by one (it calls "delete from where id=?" N times), deleteInBatch removes all items by single sql delete operation: "delete from
    ". I believe that here is no performance impact for deleteInBatch, if we compare it with delete() or deleteAll()
    – DraganS Mar 04 '15 at 23:03
  • 1
    If your entity contains @ElementCollection you should use deleteAll() instead deleteAllInBatch() – Volodymyr Yasinskyi Aug 08 '17 at 07:40
  • I prefer `deleteAllInBatch` over `deleteAll` because I had the issue that I got an "Unique index or primary key violation" after a `repo.deleteAll(); repo.save(..someList..);` call. (saving the same list two times but I was expecting that all entities are deleted) – timguy Aug 24 '17 at 07:50
  • @VolodymyrYasinskyi Is there a specific reason for that? – Arpit Agarwal Nov 16 '18 at 02:21
  • 4
    This is not a complete answer. It's very simplified and doesn't even provide documentation. I haven't researched this yet but I would bet that using the batch method there will be lots of things skipped like application-provided entity listeners and the persistence-provider's validation logic. – Martin Andersson Jul 11 '19 at 08:35
14

deleteInBatch(...) in the log would look like this: DELETE FROM table_name WHERE (((((((? = id) OR (? = id)) OR (? = id)) OR (? = id)) OR (? = id)) OR (? = id)) OR (? = id))

That might leads to a problem if there are a large amount of data to be deleted, which reaches maximum size of the SQL server query: Maximum size for a SQL Server Query? IN clause? Is there a Better Approach

Evelyn D
  • 151
  • 1
  • 4
  • 2
    Confirmed, large deleteInBatch invocations result in a StackOverflowError. To avoid this, a custom query should be written to use an IN statement, or the batch should be broken into smaller chunks. – Michael Barrett Sep 21 '19 at 05:37
  • use "spring.jpa.properties.hibernate.jdbc.batch_size" property – Fozix Jun 01 '23 at 09:52
0

Just do add curious information.

You can't create your custom delete using 'batch' on the method name and wait for spring data to resolve it, for example, you can't do this:

  void deleteByYourAttributeInBatch(Iterable<YourObject> object);

Do you need to do something like this:

  @Modifying
  @Transactional
  @Query("DELETE FROM YourObject qr WHERE o.yourAtribute IN (:object)")
  void deleteByYourAttributeInBatch(Iterable<YourObject> o);

Maybe it's an issue to spring-data ;)

Clayton K. N. Passos
  • 1,292
  • 12
  • 15
  • It's probably not a good idea to name a repository method `deleteByYourAttributeInBatch` when it doesn't actually do any batching. – Dan Bechard Apr 12 '22 at 19:45