5

Performance is key: Is it better to cascade deletes/updates inside of the Database or let Hibernate/JPA take care of it?

Will this effect the ability to query for the data if cascades are inside of the DBMS?

I am using HSQLDB if that matters.

Pascal Thivent
  • 562,542
  • 136
  • 1,062
  • 1,124
mainstringargs
  • 13,563
  • 35
  • 109
  • 174
  • I'm having pretty much the same problem. So far it appears nobody can really tell what is the corelation between JPA/DB cascades in terms of performance. – Jacek Prucia Apr 15 '15 at 15:54

3 Answers3

2

In the case of cascading updates, you simply cannot do it in application space if you have foreign key constraints in the database.

Example: say you have a lookup table for US states, with a primary key of the two-letter abbreviation. Then you have a table for mailing addresses that references it. Someone tells you that you mistakenly gave Montana the abbreviation "MO" instead of "MT" so you need to change it in the lookup table.

CREATE TABLE States (st CHAR(2) PRIMARY KEY, state VARCHAR(20) NOT NULL);
INSERT INTO States VALUES ('MO', 'Montana');

CREATE TABLE Addresses (addr VARCHAR(20), city VARCHAR(20), st CHAR(2), zip CHAR(6),
  FOREIGN KEY (st) REFERENCES States(st));
INSERT INTO Addresses VALUES ('1301 East Sixth Ave.', 'Helena', 'MO', '59620');

Now you go to fix the mistake, without the aid of database-side cascading updates. Below is a test using MySQL 5.0 (assume no records exist for Missouri, which actually does use the abbreviation "MO").

UPDATE States SET st = 'MT' WHERE st = 'MO';

ERROR 1451 (23000): Cannot delete or update a parent row: 
 a foreign key constraint fails (`test/addresses`, 
 CONSTRAINT `addresses_ibfk_1` FOREIGN KEY (`st`) REFERENCES `states` (`st`))

UPDATE Addresses SET st = 'MT' WHERE st = 'MO';

ERROR 1452 (23000): Cannot add or update a child row: 
 a foreign key constraint fails (`test/addresses`, 
 CONSTRAINT `addresses_ibfk_1` FOREIGN KEY (`st`) REFERENCES `states` (`st`))

UPDATE Addresses JOIN States USING (st)
SET Addresses.st = 'MT', States.st = 'MT'
WHERE States.st = 'MO';

ERROR 1451 (23000): Cannot delete or update a parent row: 
 a foreign key constraint fails (`test/addresses`, 
 CONSTRAINT `addresses_ibfk_1` FOREIGN KEY (`st`) REFERENCES `states` (`st`))

No application-side query can solve this situation. You need cascading updates in the database in order to perform the update in both tables atomically, before the referential integrity constraint is enforced.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Really? How about that: `INSERT INTO States VALUES ('MT', 'Montana'); UPDATE Addresses SET st = 'MT' WHERE st = 'MO'; DELETE FROM States WHERE st = 'MO'`. You can argue that this is not as efficient as done by RDBMS (which is true), but claiming that it is not possible in application space is just wrong. – Jacek Prucia Apr 15 '15 at 15:39
  • @JacekPrucia, the question was about cascading updates. Of course you can do it manually in multiple steps, and you can use a transaction to make the multi-step change atomic. But the advantage of using cascading updates is that it automatically finds the 37 other tables that reference `States` and updates those too. That would be awkward if you do it in multiple statements like you show, and if you add a 38th table, you'd have to modify your code. Whereas cascading updates would implicitly do that. – Bill Karwin Apr 15 '15 at 16:20
0
  1. Correctness is key

That said in this instance correctness and performance will almost certainly go hand in hand since the database is the right place to put (and enforce) hard constraints on data integrity and it will be measurably faster on a serious delete cascade since it:

  • Avoids multiple roundtrips to the database
  • Reduces the time during which a transaction may have to be held
  • Can make use of internal structures related to the foreign key index to do it without having to workout/reuse some execution plan
ShuggyCoUk
  • 36,004
  • 6
  • 77
  • 101
0

UPDATE: looks like similar question already was answered here When/Why to use Cascading in SQL Server?

IMO correct answer on your question will be as usual "it depends". If you are using a database as a storage of sensitive information (e.g. financial, medical etc.), or if someone else outside of your application could have access to the database I will vote for Hibernate/JPA approach. If your database is for logging (e.g. web site traffic etc.) or if you are developing software with embedded database you can relatively safely use cascade operations.

2. In most cases I will vote for Hibernate/JPA approach because it's more manageable and predictable.

I tell you a story. Some times ago young country decided to change national currency (it happened with young countries). Couple years later new DBA saw in a currency table row with obsolete currency and decided to delete it (who knows why). Guess what happened? 30% of database was deleted because of cascade deletion operations. IMO with cascade operations you have to be super careful with all you delete/update statements from one hand and you lose power of constrains (i.e. foreign keys) for database validation from other hand.

Community
  • 1
  • 1
FoxyBOA
  • 5,788
  • 8
  • 48
  • 82
  • 2
    I just point out that change of national currency are likely to occur many time in future years, and not only for young countries, just think to the Euro zone. – snowflake Mar 08 '10 at 15:06
  • The "removed one row, lost 30% of database" is not an argument against RDBMS cascading, but rather proof that this particular DBA wasn't familliar with his DB structure. You also missed core aspect of the question -- "Performance is key" as nothing in your answer relates to that. Resisted the temptation to downvote though :) – Jacek Prucia Apr 15 '15 at 15:29