What transactions diminish data integrity?

Question

I've learned a decent bit about database integrity, and know I should be using transactions if I "require multiple statements be performed as a unit to keep the data in a consistent state." Database development mistakes made by application developers (point 16, chosen answer)

Wikipedia uses the example:

Debit $100 to Groceries Expense Account
Credit $100 to Checking Account

If I try to credit a non-existent account ID, and I'm using constraints properly, an exception will be thrown and I can catch it and roll back. If there is a power outage these two changes are guaranteed to be atomic.

However, if I understand properly, transactions by themselves won't help me in all cases: (example with PHP and MySQL)

MySQL: Start transaction
MySQL: Select data from a table
PHP: Compute state with the selected data
- PHP: If the state is valid, insert data
- PHP: Otherwise, don't insert data
MySQL: Commit transaction

This won't work because the queries can be executed together atomically without failing (it's PHP that decides that there's an error, not some SQL constraint).

Secondly, and I just tested, transactions are committed synchronously, but can be started asynchronously. If I start a transaction, and add a 10 second delay, I can start the slow script, and start and commit another transaction in that time, demonstrating concurrent transactions. Two instances can select the same data, before seeing the other's modifications. Only the modifications are guaranteed to be atomic.

So what can I do? I suppose locking a table works, but is that good practice? Some conditions can be described with SQL in a single statement, but more complex ones can't.

John Tseng · Accepted Answer · 2013-07-23T15:27:05.010

This is a good question. Shows that you've been thinking about it a bit.

The problem you are describing exists because the database is not aware of your data dependencies. To the database, your code selects some data and writes some data. It doesn't know you are only writing that data based on the data selected. In general, you need to tell the database about your data dependencies. This is done differently in each database.

You mentioned MySQL. InnoDB has support for SELECT ... FOR UPDATE. This will issue a lock for the resource so that other queries cannot access the resource (depending on transaction isolation level). This will make the second transaction in your example not be able to execute until the first one commits, if they are locking the same resources. Which resources it locks is up to the database.

Let's look at an example. To lock the rows, you would first create a transaction and query the database with something like:

select * from tableA where value > 50 for update

This select will lock these rows so that incompatible locks will be blocked. Then you can do the processing in PHP. Once you are ready, you can insert rows into another table:

insert into tableB values ('some value')

At this point, before you commit, all of these rows will be locked. None of these rows will be available to other clients. Thus, throughout your whole transaction, no other client will be able to read any of the rows you've touched unless they read uncommitted. To make this work in your example, you just need to make sure all your select statements in 2 are using select for update.

The other way to do this is to tell the database on the update statement. When you issue the update statement, you also tell specify what you think the data should be. If the database does update some rows, then you can be sure that nothing else has changed your data. If you don't update the expected number of rows, you can know that someone else has changed your data, and you should handle the exception. This is optimistic concurrency where you guess that probably no one will update your data, so you do your change. Afterwards, you can check to see if someone actually did.

The query would be like:

select value from table where id = '1'

then later:

update table set value = 'new value' where id = '1' and value = 'old value'

Other databases give you other options on these two basic ideas. For example, on the optimistic model, you can verify a timestamp (or autoincrement) value instead of the actual values.

Hmmm, if I understand this is only useful for simple cases though (which is not what the question asks). For example, `SELECT ... FOR UPDATE` is crucial many times, but it only works when you're working with one row. Same with `select... where value=old value`. Good to know it, but what if I need to select a lot of data and compute a result on that to decide whether to move forward? Is table locking acceptable in these cases (it seems like something I should be very careful with)? — Raekye, Jul 23 '13 at 04:52
@Raekye Both techniques can be used with multiple rows, but select for update will be much easier to work with as it locks all selected rows. — John Tseng, Jul 23 '13 at 04:59
I mean if I'm selecting data from multiple rows - even tables, that are not being updated, but used to calculate a condition to see if another row should be inserted — Raekye, Jul 23 '13 at 14:36
@Raekye That's exactly the situation that SELECT ... FOR UPDATE is used for. It acquires exclusive locks on all rows from all tables that it reads from. No other clients can read the data. I've updated the answer with an example. Hopefully that will make it a little more clear. I want to also clarify that this is only for multiple clients. If you have only one client with multiple transactions, i.e. nested transactions, then the behavior is different. — John Tseng, Jul 23 '13 at 15:30
ah I forgot about that (that it locks all rows it reads from)... that seems right... it'll take me time to wrap my head around it :P But it seems like the right answer (+1 for now, let me think about it) — Raekye, Jul 23 '13 at 16:37
I ran some simple tests with `select ... for update`, and it "works", contrary to my understanding. I asked another question here http://stackoverflow.com/questions/17816799 that maybe you'll know the answer to — Raekye, Jul 23 '13 at 17:18

What transactions diminish data integrity?

1 Answers1