6

I need some implementation advice. I have a MYSQL DB that will be written to remotely for tasks to process locally and I need my application which is written in PHP to execute these tasks imediatly as they come in.

But of course my PHP app needs to be told when to run. I thought about using cron jobs but my app is on a windows machine. Secondly, I need to be constantly checking every few seconds and cron can only do every minute.

I thought of writing a PHP daemon but I am getting consued on hows its going to work and if its even a good idea!

I would appreciate any advice on the best way to do this.

Kay
  • 845
  • 7
  • 21
  • 33
  • You can run your script as a scheduled task using the PHP CLI, which is effectively the same as a cron job. However, I believe that the smallest time interval on a scheduled task is 5 minutes, so it wouldn't run immediately when tasks come in. – Brian Driscoll Apr 22 '11 at 17:59
  • @Brian - I didn't know about that. But I need it to be every 5 seconds or so. – Kay Apr 22 '11 at 18:00
  • @Kay - if that's the case then I think that you'll either have to have your script available to run on demand (e.g. host it in IIS) or create a daemon. – Brian Driscoll Apr 22 '11 at 18:02
  • IF the event that causes this update to be necessary is the result of another operation inside Mysql then is this a case perhaps a case for using mysql Triggers? – Cups Apr 22 '11 at 18:08
  • @Cups - I think using triggers will mean there won't be a queuing system anymore as each trigger will just process that task. I was hoping to process the top 20 tasks so as not to overload the server. Once done, I would then do the next 20. – Kay Apr 22 '11 at 18:23
  • Nothing prevents you to create a queue system in PHP. You just need MySQL to feed it with data as they come, put new tasks at the stack and execute them in order. Also, what kind of tasks is the php script doing? Are they related to updating a MySQL table or anything similar to that? – Michael J.V. Apr 29 '11 at 12:05
  • How long should each task take to run? How long do you want to sleep between batches of 20 tasks (I'm not sure for the 5s fits in - is that the maximum delay before starting a batch?) – Phil Lello May 04 '11 at 02:05
  • @BrianDriscoll, You could just run one every 5 minutes then do a loop which sleeps every 5 seconds `while(true){ work(); sleep(5); }` – Pacerier Feb 03 '15 at 07:29
  • @Key, Also look at [`sc`](http://support.microsoft.com/kb/251192) – Pacerier Feb 03 '15 at 16:37

8 Answers8

8

pyCron is a good CRON alternative for Windows:

pyCron

Since this task is quite simple I would just set up pyCron to run the following script every minute:

set_time_limit(60); // one minute, same as CRON ;)
ignore_user_abort(false); // you might wanna set this to true

while (true)
{
    $jobs = getPendingJobs();

    if ((is_array($jobs) === true) && (count($jobs) > 0))
    {
        foreach ($jobs as $job)
        {
            if (executeJob($job) === true)
            {
                markCompleted($job);
            }
        }
    }

    sleep(1); // avoid eating unnecessary CPU cycles
}

This way, if the computer goes down, you'll have a worst case delay of 60 seconds.

You might also want to look into semaphores or some kind of locking strategy like using an APC variable or checking for the existence of a locking file to avoid race conditions, using APC for example:

set_time_limit(60); // one minute, same as CRON ;)
ignore_user_abort(false); // you might wanna set this to true

if (apc_exists('lock') === false) // not locked
{
    apc_add('lock', true, 60); // lock with a ttl of 60 secs, same as set_time_limit

    while (true)
    {
        $jobs = getPendingJobs();

        if ((is_array($jobs) === true) && (count($jobs) > 0))
        {
            foreach ($jobs as $job)
            {
                if (executeJob($job) === true)
                {
                    markCompleted($job);
                }
            }
        }

        sleep(1); // avoid eating unnecessary CPU cycles
    }
}

If you're sticking with the PHP daemon do yourself a favor and drop that idea, use Gearman instead.

EDIT: I asked a related question once that might interest you: Anatomy of a Distributed System in PHP.

Community
  • 1
  • 1
Alix Axel
  • 151,645
  • 95
  • 393
  • 500
3

I'll suggest something out of the ordinary: you said you need to run the task at the point the data is written to MySQL. That implies MySQL "knows" something should be executed. It sounds like perfect scenario for MySQL's UDF sys_exec.

Basically, it would be nice if MySQL could invoke an external program once something happened to it. If you use the mentioned UDF, you can execute a php script from within - let's say, INSERT or UPDATE trigger. On the other hand, you can make it more resource-friendly and create MySQL Event (assuming you're using appropriate version) that would use sys_exec to invoke a PHP script that does certain updates at predefined intervals - that reduces the need for Cron or any similar program that can execute something at predefined intervals.

Michael J.V.
  • 5,499
  • 1
  • 20
  • 16
2

i would definately not advise to use cronjobs for this.

cronjobs are a good thing and very useful and easy for many purposes, but as you describe your needs, i think they can produce more complications than they do good. here are some things to consider:

  • what happens if jobs overlap? one takes longer to execute than one minute? are there any shared resources/deadlocks/tempfiles? - the most common method is to use a lock file, and stop the execution if its occupied right at the start of the program. but the program also has to look for further jobs right before it completes. - this however can also get complicated on windows machines because they AFAIK don't support write locks out of the box

  • cronjobs are a pain in the ass to maintain. if you want to monitor them you have to implement additional logic like a check when the program last ran. this however can get difficult if your program should run only on demand. the best way would be some sort of "job completed" field in the database or delete rows that have been processed.

  • on most unix based systems cronjobs are pretty stable now, but there are a lot of situatinos where you can break your cronjob system. most of them are based on human error. for example a sysadmin not exiting the crontab editor properly in edit mode can cause all cronjobs to be deleted. a lot of companies also have no proper monitoring system for the reasons stated above and notice as soon as their services experience problems. at this point often nobody has written down/put under version control which cronjobs should run and wild guessing and reconstruction work begins.

  • cronjob maintaince can be further complicated when external tools are used and the environment is not a native unix system. sysadmins have to gain knowledge of more programs and they can have potential errors.

i honestly think just a small script that you start from the console and let open is just fine.

<?php
while(true) {
 $job = fetch_from_db();
 if(!$job) { 
    sleep(10) 
 } else {
    $job->process();
 }
}

you can also touch a file (modify modification timestamp) in every loop, and you can write a nagios script that checks for that timestamp getting out of date so you know that your job is still running...

if you want it to start up with the system i recommend a deamon.

ps: in the company i work there is a lot of background activity for our website (crawling, update processes, calculations etc...) and the cronjobs were a real mess when i started there. their were spread over different servers responsible for different tasks. databases were accessed wildly accross the internet. a ton of nfs filesytems, samba shares etc were in place to share resouces. the place was full of single points of failures, bottlenecks and something constantly broke. there were so many technologies involved that it was very difficult to maintain and when something didnt work it needed hours of tracking down the problem and another hour of what that part even was supposed to do.

now we have one unified update program that is responsible for literally everyhing, it runs on several servers and they have a config file that defines the jobs to run. eveyrthing gets dispatched from one parent process doing an infinite loop. its easy to monitor, customice, synchronice and everything runs smoothly. it is redundant, it is syncrhonized and the granularity is fine. so it runs parallel and we can scale up to as many servers as we like.

i really suggest to sit down for enough time and think about everything as a whole and get a picture of the complete system. then invest the time and effort to implement a solution that will serve fine in future and doesnt spread tons of different programs throughout your system.

pps:

i read a lot about the minimum interaval of 1/5 minutes for cronjobs/tasks. you can easily work around that with an arbitrary script that takes over that interval:

// run every 5 minutes = 300 secs
// desired interval: 30 secs
$runs = 300/30; // be aware that the parent interval needs to be a multiple of the desired interval
for($i=0;$i<$runs;$i++) {
 $start = time();
 system('myscript.php');
 sleep(300/10-time()+$start); // compensate the time that the script needed to run. be aware that you have to implement some logic to deal with cases where the script takes longer to run than your interavl - technique and problem described above
}
The Surrican
  • 29,118
  • 24
  • 122
  • 168
  • He needs to "checking every few seconds". Why do you say that cronjob isn't the right tool for this problem? Isn't scheduled tasks specifically built to do these things? Your suggestion seems more complicated than a simple cron. – Pacerier Feb 03 '15 at 16:35
1

This looks like a job for a job server ;) Have a look at Gearman. The additional benefit of this approach is, that this is triggered by the remote side, when and only then there is something to do, instead of polling. Especially in intervals smaller than (lets say) 5 min polling is not very effective any more, depending on the tasks the job performs.

KingCrunch
  • 128,817
  • 21
  • 151
  • 173
  • Will this work if there are multiple sources that are writing to the job database? I mean they'll all have to make use of the gearman framework? This seems like a big tool for this task! – Kay Apr 22 '11 at 18:06
0

The quick and dirty way is to create a loop that continuously checks if there is new work.

Psuedo-code

set_ini("max_execution_time", "3600000000"); 
$keeplooping = true;
while($keeplooping){

   if(check_for_work()){
      process_work();
   }
   else{
     sleep(5);
   }

   // some way to change $keeplooping to false
   // you don't want to just kill the process, because it might still be doing something
}
AndrewR
  • 6,668
  • 1
  • 24
  • 38
  • 1
    If the server is restarted, or apache is restarted, this script will have to be started again which isn't exactly bullet proof to be honest. – Kay Apr 22 '11 at 18:15
  • Definitely correct. I don't know what PHP can do on Windows for automatic execution, other than maybe starting it with a scheduled task or a "startup" item. Maybe you'd want to look into creating a Windows service that could pull work, then pass it off to PHP to process it? – AndrewR Apr 22 '11 at 21:22
0

Have you tried windows scheduler(comes with Windows by default)? In this you will need to provide php path and your php file path. It works well

J Bourne
  • 1,409
  • 2
  • 14
  • 33
0

Can't you just write a java/c++ program that will query for you through a set time interval? You can have this included in the list of startup programs so its always running as well. Once a task is found, it can handle it on a separate thread even and process more requests and mark others complete.

Atticus
  • 6,585
  • 10
  • 35
  • 57
0

The most simple way is to use embed Windows schedule.

Run your script with php-cli.exe with filled php.ini with extensions your script needs.

But I should to say that in practice you don't need so short time interval to run your scheduled jobs. Just make some tests to get best time interval value for yours one. It is not recommended to setup time interval less than a 1 minute.

And another little advise: make some lock file at the beginning of your script (php flock function), check for availability to write into this lock file to prevent working of two or more copies same time and at the end of your script unlink it after unlocking.

If you have to write output result to DB try to use MySQL TRIGGERS instead of PHP. Or use events in MySQL.

Igor Popov
  • 924
  • 8
  • 14