4

I read A beginner’s guide to Read and Write Skew phenomena and A beginner’s guide to Non-Repeatable Read anomaly below to know what read skew and non-repeatable read are.

Read skew:

enter image description here

Non-repeatable read:

enter image description here

But, I cannot differentiate between read skew and non-repeatable read and basically, it seems like both can be prevented with REPEATABLE READ or SERIALIZABLE isolation level.

My questions:

  1. What is the difference between read skew and non-repeatable read?

  2. Can read skew and non-repeatable read be both prevented by REPEATABLE READ or SERIALIZABLE?

Super Kai - Kazuya Ito
  • 22,221
  • 10
  • 124
  • 129

2 Answers2

1
  • Non-repeatable read(fuzzy read) is that a transaction reads the same row at least twice but the same row's data is different between the 1st and 2nd reads because other transactions update the same row's data and commit at the same time(concurrently). *I explain more about non-repeatable read in What is the difference between Non-Repeatable Read and Phantom Read?

  • Read skew is that with two different queries, a transaction reads inconsistent data because between the 1st and 2nd queries, other transactions insert, update or delete data and commit. Finally, an inconsistent result is produced by the inconsistent data.

There are the examples of read skew below which you can do with MySQL and 2 command prompts.

For these examples of read skew below, you need to set READ COMMITTED isolation level to occur read skew:

SET GLOBAL TRANSACTION ISOLATION LEVEL READ COMMITTED;
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;

And, there is bank_account table with id, name and balance as shown below.

bank_account table:

id name balance
1 John 600
2 Tom 400

These steps below shows read skew. *300 is transferred from John's balance to Tom's balance. Then from Tom's balance, T1 reads 100(Inconsistent data) instead of 400. Finally, 600 + 100 = 700(Inconsistent result) in the total of John's and Tom's balances:

Flow Transaction 1 (T1) Transaction 2 (T2) Explanation
Step 1 BEGIN; T1 starts.
Step 2 BEGIN; T2 starts.
Step 3 SELECT balance FROM bank_account WHERE id = 1;

600
T1 reads 600.
Step 4 UPDATE bank_account set balance = 900 WHERE id = 1; T2 updates 600 to 900 because 300 is transferred from Tom's balance.
Step 5 UPDATE bank_account set balance = 100 WHERE id = 2; T2 updates 400 to 100 because 300 is transferred to John's balance.
Step 6 COMMIT; T2 commits.
Step 7 SELECT balance FROM bank_account WHERE id = 2;

100
T1 reads 100(Inconsistent data) instead of 400 after T2 commits.
Step 8 600 + 100 = 700 Finally, T1 gets 700(Inconsistent result) instead of 1000.

*Read skew occurs.

Step 9 COMMIT; T1 commits.

These steps below also shows read skew. *300 is withdrawn from Tom's balance. Then, T1 reads 100(Inconsistent data) instead of 400. Finally, 600 + 100 = 700(Inconsistent result) in the total of John's and Tom's balances:

Flow Transaction 1 (T1) Transaction 2 (T2) Explanation
Step 1 BEGIN; T1 starts.
Step 2 BEGIN; T2 starts.
Step 3 SELECT balance FROM bank_account WHERE id = 1;

600
T1 reads 600.
Step 4 UPDATE bank_account set balance = 100 WHERE id = 2; T2 updates 400 to 100 because 300 is withdrawn from Tom's balance.
Step 5 COMMIT; T2 commits.
Step 6 SELECT balance FROM bank_account WHERE id = 2;

100
T1 reads 100(Inconsistent data) instead of 400 after T2 commits.
Step 7 600 + 100 = 700 Finally, T1 gets 700(Inconsistent result) instead of 1000.

*Read skew occurs.

Step 8 COMMIT; T1 commits.

In addition, this is also the example of read skew. There are teacher and student tables with id, name as shown below.

teacher table:

id name
1 John
2 Tom

student table:

id name
1 Anna
2 Sarah
3 David
4 Mark
5 Kai

These steps below shows read skew. *Lisa, Peter and Roy are inserted to "student" table. Then, T1 reads 8(Inconsistent data) instead of 5. Finally, 2 + 8 = 9(Inconsistent result) in the total of teachers and students:

Flow Transaction 1 (T1) Transaction 2 (T2) Explanation
Step 1 BEGIN; T1 starts.
Step 2 BEGIN; T2 starts.
Step 3 SELECT count(*) FROM teacher;

2
T1 reads 2.
Step 4 INSERT INTO student values (6, 'Lisa'), (7, 'Peter'), (8, 'Roy'); T2 inserts Lisa, Peter and Roy.
Step 5 COMMIT; T2 commits.
Step 6 SELECT count(*) FROM student;

8
T1 reads 8(Inconsistent data) instead of 5 after T2 commits.
Step 7 2 + 8 = 10 Finally, T1 gets 10(Inconsistent result) instead of 7.

*Read skew occurs!!

Step 8 COMMIT; T1 commits.

These tables below are my experiment results of read skew with each isolation level in MySQL and PostgreSQL. *Yes means Occurs, No means Doesn't occur.

MySQL:

Isolation level Read skew
READ UNCOMMITTED Yes
READ COMMITTED Yes
REPEATABLE READ No
SERIALIZABLE No

PostgreSQL:

Isolation level Read skew
READ UNCOMMITTED Yes
READ COMMITTED Yes
REPEATABLE READ No
SERIALIZABLE No
Super Kai - Kazuya Ito
  • 22,221
  • 10
  • 124
  • 129
1

1. What is the difference between read skew and non-repeatable read?

We have two data - let x and y, and there is a relation between them.(e.g parent/child)

Transaction T1 reads x, and then a second transaction T2 updates x and y to new values and commits. If now T1 reads y, it may see an inconsistent state, and therefore produce an inconsistent state as output.

Acceptable consistent states:

x and y

*x and *y

Note: * denotes the updated value of the variable

When x and y are the same data, meaning to read them, need to execute the same query. I guess, it leads to the problem of non-repeatable.

IMHO, even if we may call read skew is a generalization form of a non-repeatable problem.

2. Can read skew and non-repeatable read be both prevented by REPEATABLE READ or SERIALIZABLE?

Serializable isolation level permits transactions to run concurrently, it creates the effect that transactions are running in serial order: read skew/non-repeatable prevented

Repeatable read isolation level guarantees that each transaction will return the same row regardless of how many times executed. From the definition, it seems read-skew may not be prevented. Without knowing how it is implemented, it is hard to claim anything.

Today, DBMS engines use different approaches-concurrency control strategies to implement REPEATABLE Isolation level.

e.g Postgres use database snapshot(consistent view) to implement REPEATABLE READ Isolation level: it will prevent read skew

Other engines may use lock-based concurrency control mechanisms to implement it. - may not prevent read skew.

egasimov
  • 41
  • 5