5

I am using InnoDB and have the following table

officeRechNr

year  | month  | id   | 
------------------------
2016  | 7      | 2    | 
2016  | 6      | 5    |
2016  | 5      | 6    |

My script works as follows:

  1. Get id from officeRechNr from current year and month

  2. Increase id by one and update in officeRechNr

  3. Echo increased id

So if the script would be executed one after another, I would expect:

New id is 3
New id is 4
New id is 5

I assumed that this behaves differently when I execute the script parallel.

Here is my script:

 $db = new mysqli("localhost","user","pass","db");

 $year     = date("Y");
 $month    = date("m");

 $stmt         = $db->prepare('SELECT zahl FROM officeRechNr WHERE jahr = ? AND monat = ?');

 $stmt->bind_param('ii', $year, $month);
 $stmt->execute();

 $result            =   $stmt->get_result();
 $row           =   $result->fetch_assoc();

 $number       =  $row['zahl'] + 1;
 sleep(20);

 $stmt         = $db->prepare('UPDATE officeRechNr set zahl = ? WHERE jahr = ? and monat = ?');
 $stmt->bind_param('iii',$number, $year, $month);
 $stmt->execute();
 echo "New id is $number";

I executed both scripts simultaneously, so I assumed $number must be equal to 3 for both scripts and then they sleep for 20 seconds. Therefore I would expect

New id is 3
New id is 3

to my surprise, the outcome was

New id is 3
New id is 4

I thought I need to write-lock the database to achieve this result, as explained in Lock mysql table with php.

Why did I not get the expected outcome? Does this script always return different id's - even if executed at exact same time?

Community
  • 1
  • 1
Adam
  • 25,960
  • 22
  • 158
  • 247
  • Nothing surprising about this result actually, just because a race condition can happen doesn't mean that it always happens. – e4c5 Jul 23 '16 at 16:00
  • @e4c5 but both scripts were sleeping for 20 seconds, I dont understand how they cannot have the same `$number` value when I execute both scripts within only 1 second. – Adam Jul 23 '16 at 16:01
  • How do you know that both threads were not sleeping at the same time? If you want to force a race condition, have about 10 15 threads running in parrallel – e4c5 Jul 23 '16 at 16:02
  • no "threads" involved. – Ven Jul 23 '16 at 16:04
  • @Adam how did you launch "both scripts simultaneously"? please add it to your question – Ven Jul 23 '16 at 16:05
  • @Ven I opened two tabs in Firefox and started the script in each tab. They do not start exactly simultaneously but with a delay of a few second or something. I still expect that they both return the same number, because they read the current number and then sleep for 20 seconds before actually increasing the id. – Adam Jul 23 '16 at 16:11
  • @Adam what's the type of the column? – Ven Jul 23 '16 at 16:13
  • @Adam and are you sure your webserver didn't wait for the 1st request to finish before executing the second one? – Ven Jul 23 '16 at 16:14
  • @Ven they are all int(11) columns. year & month together build the primary key. For your second question: I do not know. How can I find this out? I did this experiment very often. When I start 10 tabs, then they dont all finish after 20 seconds, it seems like they stack up or something. – Adam Jul 23 '16 at 16:17
  • @Ven that is what is happening, i was just going to write a answer based on this, but go ahead – BinaryGhost Jul 23 '16 at 16:17
  • @Adam What server are you using? – BinaryGhost Jul 23 '16 at 16:18
  • @BinaryGhost I am using xampp – Adam Jul 23 '16 at 16:18
  • @Adam I think that xamp is using a 1 php process for all of the requets, so that scripts wait for each other to finish – BinaryGhost Jul 23 '16 at 16:21

3 Answers3

3

It seems like your webserver is unable to execute more than one request at a time.

What this means, in practice, is that even if you open your script in two different times, the server will not execute the second request before the first one is finished.

The timeline looks like this:

  • You open tab 1.
  • You open tab 2.
  • Webserver receives the query from tab 1, and starts executing it.
  • Webserver starts executing it.
  • Tab 1 reads "3" from the DB.
  • Webserves receives the query from tab 2. However, it doesn't have any free worker, so it delays its execution (puts it in a waiting list).
  • Tab 1 stores "4" in the DB.
  • Webserver completes the execution of your tab 1 query, and returns the result.
  • Now that the webserver has nothing to do, it looks at its waiting list, finds the query from tab 2, and starts executing it.
  • Tab 2 reads "4" from the DB.
  • Tab 2 stores "5" in the DB.
  • It completes of the tab 2 query and returns the result.

Of course, in a production environment, a real webserver will execute multiple requests at once, which means your script is very unsafe.

Prefer using your database's builtin solutions for this (AUTO_INCREMENT for MySQL, SERIAL for PostgreSQL, etc).

Ven
  • 19,015
  • 2
  • 41
  • 61
  • Thank you. Is it correct that on a real webserver I would always get different `id`'s with the script (even when executed parallel) when I use `lock tables` at the beginning and `unlock tables` as explained in http://stackoverflow.com/questions/12179290/lock-mysql-table-with-php? – Adam Jul 23 '16 at 16:27
  • Locking tables would be very overkill! Anything wrong with AUTO INCREMENT as I suggested? – Ven Jul 23 '16 at 16:28
  • The problem with AUTO INCREMENT is that for each key (year,month) the id starts from 1 again and increases then. So actually I first check if the key (year,month) exists, if not I add (year,month,1), otherwise I increase id by one. Not sure how to realize this with AUTO INCREMENT, but I assume I should post another question for this issue. – Adam Jul 23 '16 at 16:37
1

Adam, it seems that your queries are not executing in parallel.

Record times at crucial events and print them to confirm this.

After the first statement execute:

$stmt = $db->prepare('SELECT zahl ...');
...
$stmt->execute();
$times['select'] = date('H:i:s'); //human-readable format. eg: 11:15:32

Then again after the value is updated at the server:

$stmt = $db->prepare('UPDATE officeRechNr ...');
...
$stmt->execute();
$times['update'] = date('H:i:s');

I suspect we'll find out if you do print_r($times) for each query that the queries were not happening in parallel, but in sequence.

This is definitely the case if you're using the built-in PHP server. The docs say:

The web server runs a only one single-threaded process, so PHP applications will stall if a request is blocked.

BeetleJuice
  • 39,516
  • 19
  • 105
  • 165
  • Yes doing this experiment reveals that the querys are executed in queries. – Adam Jul 23 '16 at 16:24
  • @Adam It was a great question. I added a reference to the docs that applies to you if you're using the built-in PHP server – BeetleJuice Jul 23 '16 at 16:27
1

If you want locking but not full-table locking, try using SELECT...FOR UPDATE in a transaction to lock just the rows you select (assuming you have an index on jahr, monat). See http://dev.mysql.com/doc/refman/5.7/en/innodb-locking-reads.html

Alternatively, you don't need locking at all. You could use

UPDATE officeRechNr set zahl = LAST_INSERT_ID(zahl+1) WHERE jahr = ? and monat = ?

Read the docs of http://dev.mysql.com/doc/refman/5.7/en/information-functions.html#function_last-insert-id for details.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Thank you, this is just what I was looking for instead of the locking. – Adam Jul 23 '16 at 18:16
  • 1
    And of course I have to plug my own book: [SQL Antipatterns: Avoiding the Pitfalls of Database Programming](https://pragprog.com/book/bksqla/sql-antipatterns). It's not really an introductory book -- it assumes the reader knows the basics, or can look them up. – Bill Karwin Jul 23 '16 at 19:33
  • Hey Bill.. I know this was a long time ago, but I used that query and enjoyed it but it created a problem, maybe you could have a look at https://stackoverflow.com/questions/47440180/last-insert-id-is-unequal-to-db-insert-id ? Also, could you explain why I don't need locking with that query? As far as I have understood the docs, `zahl = LAST_INSERT_ID(zahl+1) ` is just necessary to save the updated number for `SELECT LAST_INSERT_ID()`. But if this query is executed parallel (2 times) then I think its still possible that the id is only increased once..? Thank you! – Adam Nov 23 '17 at 13:26