1

The below script is called every 5 seconds. The issue is that if the server is responding slow, one entry in "blog" can get selected twice in a row because the server hasn't had time to set "done" to "1" yet. Is there an industry standard (or whatever you call it) way to prevent this from happening?

$result = mysql_query("SELECT * FROM blogs WHERE done=0 LIMIT 1");
$rows = mysql_num_rows($result); //If there are no entries in with done set to 0, that means we've done them all; reset all entries to 0.
if($rows == 0)
{
    mysql_query("UPDATE blogs SET done=0 WHERE done=1");
}
else
{
    while($row = mysql_fetch_array($result))
    {
        mysql_query("UPDATE blogs SET done=1 WHERE id=$row[id]");
        // Do stuff
    }
}

I think I could change it to

while($row = mysql_fetch_array($result))
{
      if($row['done'] == 1){ die; }
      mysql_query("UPDATE blogs SET done=1 WHERE id=$row[id]");
      //Do stuff
}

But will that really fix the problem? I would imagine there would be a better way that really prevents it from happening without a shadow of a doubt.

natli
  • 3,782
  • 11
  • 51
  • 82
  • possible duplicate of [Processing action twice - how to prevent that?](http://stackoverflow.com/questions/6235654/processing-action-twice-how-to-prevent-that) – genesis Oct 08 '11 at 16:33
  • May I ask what mechanism you are using to make the script run every 5 seconds? What is the code for that? – Luke Oct 08 '11 at 16:33
  • @Coulton just add a timestamp anywhere in the mysql database and compare it to the current time in your PHP script. If there's more than 5second difference, run it. If not, run die; Of course you would have to visit the website every 5second for this to work but there's plenty of 3d party services to do that, or even with a cron job. – natli Oct 08 '11 at 16:37
  • @genesis-φ Unfortunately that page doesn't provide any solutions at all. But you're right, it's pretty much the same question. – natli Oct 08 '11 at 16:42
  • @natli Sadly I think that you've got your logic all wrong, or you're trying to achieve something in a way that could be achieved more logically. What is it that this code should achieve? There is no code that you could add to prevent the same script being submitted at the same time from different computers. If you need something to happen that is specific each individual user, it might be worth saving data in SESSIONs: see [docs here](http://www.php.net/manual/en/function.session-start.php) – Luke Oct 08 '11 at 16:44
  • Sessions couldn't possible help this in any way at all. It's just a background task that needs to be completed for all rows in the database. This could be easily achieved by removing LIMIT 1 from the query but that would mean I'd have to set the timelimit to infinit, which is something I don't want to do. Doing it this way ensures that 1. every row handled gets its own process 2. its own timeout and 3. There is at least 5sec in between handling each row. – natli Oct 08 '11 at 16:54

3 Answers3

4

I think the best way to prevent selecting the same row is using SELECT GET_LOCK("lock_name"); and SELECT RELEASE_LOCK("lock_name");. When you get a lock from mysql server, other processing trying to get a lock will wait for the lock to be released. Below is a sample implementation:

<?php
function getLock($lockName, $dbc) {
    $query = "SELECT GET_LOCK('".$lockName."', 0)";
    $result = mysql_query($query, $dbc);
    $lockResult = mysql_fetch_row($result);
    $lockResult = $lockResult[0];
    return $lockResult == 1 ? true : false;
}

function releaseLock($lockName, $dbc) {
    $query = "SELECT RELEASE_LOCK('".$lockName."')";
    $result = mysql_query($query, $dbc);
}

// CONNECT TO DATABASE
$dbc = mysql_connect('localhost', 'root', '');
mysql_select_db('test', $dbc);

$loopQueue = true;
$rowsProcessed = 0;

// MAIN QUEUE LOOP
while ($loopQueue) {

    // TRY UNTIL GETTING A LOCK
    $queueLockName = 'queue_lock_1';
    while (getLock($queueLockName, $dbc) === true) {

        // WE GOT THE LOCK, GET A QUEUE ROW WITH PENDING STATUS
        $query = 'SELECT * FROM test WHERE status = 0 ORDER BY ID ASC LIMIT 1';
        $result = mysql_query($query, $dbc);

        if (mysql_num_rows($result) < 1) {
            // SINCE WE DON"T HAVE ANY QUEUE ROWS, RELEASE THE LOCK
            releaseLock($queueLockName, $dbc);
            // WE DONT NEED TO LOOP THE MAIN QUEUE ANYMORE SINCE WE DONT HAVE ANY QUEUE ROWS PENDING
            $loopQueue = false;
            // BREAK THIS LOOP
            break;
        }

        // WE GOT THE QUEUE ROW, CONVERT IT TO ARRAY
        $queueRowArray = mysql_fetch_assoc($result);

        // UPDATE QUEUE ROW STATUS TO SENDING
        $query = 'UPDATE test SET status = 1 WHERE id = '.$queueRowArray['id'];
        mysql_query($query);

        // RELEASE THE LOCK SO OTHER JOBS CAN GET QUEUE ROWS
        releaseLock($queueLockName, $dbc);

        // DO STUFF ...

        // UPDATE QUEUE ROW STATUS TO PROCESSED
        $query = 'UPDATE test SET status = 2 WHERE id = '.$queueRowArray['id'];
        mysql_query($query);

        $rowsProcessed++;
    }
}

echo "\n\n".'process finished ('.$rowsProcessed.')'."\n\n";
matte
  • 1,196
  • 4
  • 13
  • 21
  • Do I understand this correctly; the lock locks the entire database, and not just the 1 row we're working on? Because that would mean no two (or more) processes can run side by side, they'd have to wait for each other to finish, wasting alot of time. It does look interesting though. – natli Oct 08 '11 at 19:06
  • @natli Acquiring a lock with `GET_LOCK` does not lock the database or any table. `GET_LOCK` is mainly used to obtain application level locks, not database level locks. You can find more information on official documentation: http://dev.mysql.com/doc/refman/5.0/en/miscellaneous-functions.html#function_get-lock – matte Oct 08 '11 at 19:12
  • Actually I myself use this method on a job queue which 10-12 processes work on the queue all the time and there haven't been any duplicate processing, and it is impossible because there is no way to get a lock when it is got by another process (until it is released by that process). – matte Oct 08 '11 at 19:21
2

I would have given a go to transactions. Here is an example in another StackOverflow question

Just a question: What happens if the server is even slower? For instance, the select statament takes so long (e.g. 5 seconds) that once it finishes (returning 0 rows), the new select is executed (returning 1 or more rows)

MySQL documentation

Community
  • 1
  • 1
Mosty Mostacho
  • 42,742
  • 16
  • 96
  • 123
  • I'll read up on transactions some more, I don't quite understand it yet from that question alone and the mysql docs always look like Chinese to me. Looks good though, thanks. But how would it be possible for the second select to find rows if the first select never got to set any "done" fields to 1? And if you mean that the second select would find rows with done=1 while the first script was still updating other rows to 1, that wouldn't hurt. I guess I don't really understand what you're saying. – natli Oct 08 '11 at 19:01
  • Oh, probably I made a wrong assumption. I assumed the "done" value could also be modified somewhere else (e.g. by the owner of the blog performing some action over it). If that was the case, the first select would run, the user would update it, and then the next select would run and return different data. Just a concurrency issue that maybe you are not having – Mosty Mostacho Oct 08 '11 at 19:14
1
$result = mysql_query("SELECT * FROM blogs WHERE done=0 LIMIT 1");
$rows = mysql_num_rows($result); //If there are no entries in with done set to 0, that means we've done them all; reset all entries to 0.
if($rows == 0)
{
    mysql_query("UPDATE blogs SET done=0 WHERE done=1");
}
else
{
    while($row = mysql_fetch_array($result))
    {
        mysql_query("UPDATE blogs SET done=1 WHERE id=$row[id] AND done=0");
        if(mysql_affected_rows() != 1)
            die();
        // Do stuff
    }
}
Dark Falcon
  • 43,592
  • 5
  • 83
  • 98
  • Thanks for the tip, if I do this, is there an absolute 0% chance of the problem occuring? As in, the two scripts can *never* reach //Do stuff at the same time? – natli Oct 08 '11 at 16:39
  • Assuming MyISAM or InnoDB tables and that `id` is unique, every statement is atomic, so yes: `done` can never be updated again when it is already 1. – Dark Falcon Oct 08 '11 at 16:43