17

Explanation

An API call (to another service) which usually takes 10-20 seconds to respond is stored in database,

After it is stored, System will try to use the API instantly to show the result to user, but it might fail (and display that it failed but we will try again automatically), therefore there is also a Cron Job set to run every 30 seconds and try the (failed) queries again.

If the API return success (whether in instant usage or using Cron Job) the flag is changed to success in database and it will not run again.

Issue

My problem is while the Instant Call to API is in process, the Cron Job might also try another call as it is not yet flagged as successful,

Also in rare cases, while the previous Cron Job is in process, the next Cron Job might run the code again.

What I have already tried to prevent the issue

I tried storing In Process API calls in a database table with Status=1 and delete them when the API call was successful or set status to 0 if it failed,

 if ($status === 0)
 {
     
     // Set Status to 1 in Database First (or die() if database update failed)
     
     // Then Call The API

     // If Failed Set Status to 0 so Cron Job can try again
     
     // If Successful Change Flag to success and remove from queue

 }
     

But what if the Instant Call and the Cron Job Call happen at the exact same time? they both check if status is 0 which it is, then both set status to 1 and execute the API Call...

Questions

  1. Is what I have tried the correct way to handle this?

  2. Should I worry about them happening at the exact time (the issue i explained in the Yellow Quote above) if there are a lot of calls (sometimes +500/sec)

Update Before Bounty

Isn't there really an easy way to handle such cases on the PHP side? if not, which way is better in experts' opinion? below are some methods but none of them are detailed enough and none of them have any Downvotes/Upvotes.

P.S. There are many updates/inserts to database, I don't think locking is an efficient idea and I'm not sure about the rest of ideas.

Community
  • 1
  • 1
Vladimir
  • 1,602
  • 2
  • 18
  • 40

8 Answers8

18

This is exactly why Semaphore was created for.

In php, it can be used in the following way : Using semaphores in PHP is actually very straight forward. There are only 4 semaphore functions:

sem_acquire() – Attempt to acquire control of a semaphore.
sem_get() – Creates (or gets if already present) a semaphore.
sem_release() – Releases the a semaphore if it is already acquired.
sem_remove() – Removes (deletes) a semaphore.

So how do they all work together?

  1. First, you call sem_get() to fetch the identifier for the semaphore.
  2. After that, one of your processes will call sem_acquire() to try and acquire the semaphore. If it’s currently unavailable, sem_acquire() will block until the semaphore is released by another process.
  3. Once the semaphore is acquired, you may access the resource that you are controlling with it.
  4. After you are done with the resource, call sem_release() so that another process can acquire the semaphore.
  5. When all is said and done, and you’ve made sure that none of your processes require the semaphore anymore, you can call sem_remove() to remove the semaphore completely.

You can find more information and example about this in this article.

Samuel Dauzon
  • 10,744
  • 13
  • 61
  • 94
Lovau
  • 537
  • 2
  • 5
  • 1
    Semaphores will work when processes are running on the same machine. They are probably better to be used for access to local machine resources like files. You also need to deal with crashed processes. – alex347 Jun 04 '15 at 16:33
  • 1
    @alex347 yeah, this is a big deal with semaphore, if a process crashes without releasing the semaphore, it's game over... Using try/catch/finally is highly recommanded to always release the semaphore before crashing... only a `kill -9` may cause an instant kill and so no time to release anything before dying... but there is no solution for this... What is the solution for remote processes ? – Random Jun 08 '15 at 13:36
  • It is worth mentioning that these beautiful methods does not work on Windows and some WebHosting providers. – M at Dec 09 '19 at 13:22
5

what I do in scripts is (pseudocode)

SCRIPT START
LOCK FILE 'MYPROCESSFILE.LOCK'
DO SOMETHING I WANT
UNLOCK FILE 'MYPROCESSFILE.LOCK'
SCRIPT END

So if the file is locked the second (duplicated) process wont run (will lock/halt/wait) UNTIL the file is UNLOCKED by the original process.

EDIT updated with WORKING PHP code

<?php

    class Locker {

        public $filename;
        private $_lock;

        public function __construct($filename) {
            $this->filename = $filename;
        }

        /**
         * locks relevant file
         */
        public function lock() {
                touch($this->filename);
                $this->_lock = fopen($this->filename, 'r');
                flock($this->_lock, LOCK_EX);
        }

        /**
         * unlock above file
         */
        public function unlock() {
                flock($this->_lock, LOCK_UN);
        }

    }

    $locker = new Locker('locker.lock');
    echo "Waiting\n";
    $locker->lock();
    echo "Sleeping\n";
    sleep(30);
    echo "Done\n";
    $locker->unlock();

?>
Jacek Pietal
  • 1,980
  • 1
  • 18
  • 27
  • just save this to file as locker.php, run php locker.php, and in another window php locker.php you will see one is sleeping other is waiting – Jacek Pietal Jun 08 '15 at 14:30
  • What about multi server applications? This will work when all php scripts are located on the same server only. – alex347 Jun 09 '15 at 12:45
  • @Prozi Thank you very much, What happens if they both run at the exact same time? (I already can handle the issue if the next code runs for example 10ms after the first) – Vladimir Jun 09 '15 at 19:17
  • read about PHP function flock(), A starts, B starts, A locks file, B waits because of lock, A finishes, B locks file, B finishes – Jacek Pietal Jun 09 '15 at 22:06
  • Great Solution @Prozi helped me with my similar issue – Josh May 29 '19 at 00:00
2

You need a proper queuing solution here. You can implement it yourself using a queue table and table locks to avoid different processes picking up the same job.

So you can pick up tasks from the queue table like this:

LOCK TABLES table WRITE;
SELECT * FORM table WHERE status = 0 LIMIT 1;
set status = 1 for the selected row
UNLOCK TABLES;

Locking the table will ensure that other processes don't do SELECTs and don't pick up the same row from the table.

Inserting the job to the queue as simple as this:

INSERT INTO table (job_id, status) VALUES(NULL, status);

Removing the job after processing is completed:

DELETE FROM table WHERE job_id = 12345;
alex347
  • 621
  • 7
  • 18
  • Thanks, This actually seems simpler/more understandable than the other ways to me, but what will happen to my inserts? if I try to INSERT another API Call to this queue table while its locked, will my INSERT query fail (and I have to show Failed error to user)? or will it wait in MySQL queue and happen after unlock so no `Query Failed` errors in php side? (user is waiting to see the result). Can't we just lock SELECTing this Query and allow inserts? – Vladimir Jun 04 '15 at 20:20
  • 1
    This will lock tables for any queries (SELECT/INSERT/UPDATE/DELETE etc). Only the process which acquired the lock will be able to work with the table. Queries from other processes will not fail, they will wait until table becomes unlocked. If table is locked for more than something like 50 seconds (depends on your server configuration) then you may hit a lock timeout (mysql error 1205). In that case you'll just need to restart your query or increase the timeout setting in the config. – alex347 Jun 05 '15 at 08:35
  • actually the default lock timeout is 31536000 seconds or one year, so you shouldn't hit that error – alex347 Jun 05 '15 at 08:49
  • @alex347 What a strange default timeout... if a process gets locked for 1 year, there *may* be a problem indeed ! – Random Jun 08 '15 at 13:55
  • @Random Mysql timeouts can be configured from php. If you have a simple table for the queue purposes only and use the queries I provided, the table will not be locked for longer than a second or so, usually it is just milliseconds. – alex347 Jun 09 '15 at 12:53
1

On each cron job start check whether the lock file exist or not if exit exit if not crate lock file in some temp directory after completion of api process un-link this file.

Sardesh Sharma
  • 1,671
  • 1
  • 12
  • 14
  • 1
    can u just try Peterson two process solution for critical section problem. http://en.wikipedia.org/wiki/Peterson's_algorithm – Sardesh Sharma May 31 '15 at 19:41
  • I don't know How is this any different than what i have done? (setting and checking status === 1 and 0 in database) what if they both check at the same time when the file doesn't exist, then both of them try to create the lock file? – Vladimir Jun 02 '15 at 11:24
  • Have you followed Perterson's algorithm link?. Store flag1, flag2, turn in separate table for shared access purpose and follow the process one and process 2 algorithm , it will insure that one process is running at a time. I think this should work for your problem. – Sardesh Sharma Jun 03 '15 at 09:08
1

Since you should know the times that the cron will run (say every 5 min) then for your user requested function could you check to see if the system time is exactly when a cron should be running? That would at lest prevent them from running at the exact same time.

  • what a siren go off in the building every 30 seconds. should drastically improve worker performance ! – Drew May 30 '15 at 20:00
1

I use this on Linux to see if a script is running when multiple execution needs to be avoided:

$output = array();
exec('pgrep -fl the_script.php', $output);

Then scan through $output and determine if it is already in progress.

For instance, here is copy/paste of existing code:

$exec_output = array();
exec('pgrep -fl archiver.php', $exec_output);
$pid_count = 0;
foreach ($exec_output as $line) {
    $parts = explode(' ', $line);
    if (basename($parts[2]) == 'archiver.php') $pid_count++;
}

Then do things based on $pid_count. The basename() check is to make sure I don't catch some other thing like special_archiver.php or whatever might exist. You can check for the full path as well.

mike.k
  • 3,277
  • 1
  • 12
  • 18
0

Semaphores can be installed in php, and for signal control at a kernel level, it will control process locking atomically. Unix was designed to use this mechanism alongside other methods like signals for interprocess communication. Not sure you need to get that sophisticated.

It may work looking at the output of ps -ef but it may be subject to system load and process priority. You may find it works using a database flag, but why add overhead? Databases can get busy.

I would think a simple file is probably less prone to latency issues when you want to make 500 checks per second.

e.g. if the cron script starts with

if ( ! -f otherprocessisrunning)
then
   // create/open the file
   > cronprocessisrunning

   // when cron process finishes
   // it removes the cronprocessisrunning file
   rm -f cronprocessisrunning
else 
   sleep for 2 minutes
   call this function
fi

and the other script has the same behaviour in php that does this

if (! file_exist(cronprocessisrunning))
    > otherprocessisrunning
    start the other process
    when it is finished, remove otherprocessisrunning
endif

It should be fast enough as creating file handles (with no content) translates to a simple system call. If it is not, try it in bash shell.

  • Why would you charge the server so much (500 checks / second !) when you already had the correct way in the first place : using semaphore ? :) – Lovau Jun 02 '15 at 09:55
  • Sure, Semaphores are a correct solution. This method would be (I think) more successful than some of the up-voted ideas which appear to work but (like this) are not well controlled ways of multitasking latch control. But if it can be done without installing the Semaphores module (which may be tricky on shared hosting, etc.), then I would flag it as an inode, rather than a DB entity or inspecting the text output of ps. – Nicholas Alexander Jun 02 '15 at 17:10
0

I don't know if that can be a good way:

temp_queue Table
-----------------------
id --> Int, Index, Autoincrement
query_id --> Int (your query ID or something to identificate a specific query)
in_use_by --> varchar (cron or api)

Cron job:

Script starts

SELECT in_use_by FROM temp_queue ORDER_BY id ASC LIMIT 1;

if results != 0 return;

INSERT INTO temp_queue SET query_id=SOME_ID, in_use_by = 'cron';
SELECT in_use_by FROM temp_queue ORDER_BY id ASC LIMIT 1;

Then check the last SELECT results

if in_use_by == 'cron' continue
else return

When the execution ends:

DELETE FROM temp_queue WHERE query_id=SOME_ID

API job:

Script starts

SELECT in_use_by FROM temp_queue ORDER_BY id ASC LIMIT 1;

if results != 0 return;

INSERT INTO temp_queue SET query_id=SOME_ID, in_use_by = 'api';
SELECT in_use_by FROM temp_queue ORDER_BY id ASC LIMIT 1;

Then check the last SELECT results

if in_use_by == 'api' continue
else return

When the execution ends:

DELETE FROM temp_queue WHERE query_id=SOME_ID

What happen if the Cron Job & API try to call the query at exactly same time? They both will check for the 1st writed line with query_id=SOME_ID so only 1 of they with continue.

Yes, a lot of selects, inserts and deletes. But it works.

What do you guys think about that?

nada
  • 972
  • 5
  • 22