2

I have a process users must go through on my site which can take quite a bit of time (upwards of an hour in certain cases).

I'd like to be able to have the user start the process, then be told that it is running in the background and they can leave the page and will be emailed when the process is complete. This would help avoid cases when the user gets impatient and closes the window before the process has finished.

An example of how it would ideally look is how Mailchimp handles importing contacts. You upload a CSV file of your contacts, and they then say that the contacts are currently uploading, but it can take a while so feel free to leave the page.

What would be the best way to accomplish this? I looked into Gearman, however it seems like that tool is more useful for scaling large amounts of tasks to happen quickly, not running processes in the background.

Thanks for your help.

Aaron Marks
  • 375
  • 2
  • 7
  • 19

3 Answers3

4

Even it doesn't seem to be what you'd use at the first look, I think I would use Gearman, for that :

  • You can push tasks to it when the user does his action
  • It'll deal with both :
    • balancing tasks to several servers, if you have more than one
    • queuing, so no more than X tasks are executed in parallel.
  • No need to re-invent the wheel ;-)
Pascal MARTIN
  • 395,085
  • 80
  • 655
  • 663
  • I was under the impression that Gearman would take the pushed job, but the user would still need to wait for the page to finish loading until the worker fed back the result? The solution I'm looking for will allow the user to exit the page, but still have the job (which might take an entire hour) be processed in the background. – Aaron Marks Mar 25 '11 at 18:43
  • You can push jobs in background, with Gearman. See for instance http://fr2.php.net/manual/en/gearmanclient.dobackground.php – Pascal MARTIN Mar 25 '11 at 18:45
  • @Aaron German start a new process separated from the user's request. So the user can safely closes his browser and the job will run in the background. – Jan Hančič Mar 25 '11 at 18:46
1

You might want to take a look at creating a daemon. I'd suggestion writing the daemon in a language other than PHP (node.js maybe?), but if you already have a large(ish) code base in PHP this mightn't be desirable. Try taking a look at How to design a daemon with a MySQL DB connection.

I've been working on a library call LooPHP in PHP to allow event driven programming for PHP (often desirable for daemons). The library allows for timed events, multi-threaded listeners (when you want one event queue to be feed from >1 type of source).

If you could give us some more information on what exactly this background process does, it might be helpful.

Community
  • 1
  • 1
Kendall Hopkins
  • 43,213
  • 17
  • 66
  • 89
  • Sure, thanks for the initial response as well. The background process is for gathering location analytics about a specific Twitter or Myspace user. For Twitter, the length comes from making calls to the Twitter API, and then verifying the locations of the found users via the Mapquest API. For Myspace, we are crawling each friend of the specific user's profile, which can take very long for users with thousands of friends. – Aaron Marks Mar 25 '11 at 18:47
  • Sounds like a good fit for Gearman or a custom daemon (if you need more control than Gearman offers). – Kendall Hopkins Mar 25 '11 at 18:50
0

Write out a file using the user's ID as the filename. Spawn a new process to perform whatever it is you want it to do (if what you want is to have it execute some more PHP, then you can just call PHP with the script you want to run). When that process is done, have it delete that file. If the user visits the page again, have the script check for existence of the file (since the filename is predictable based on user ID). If it exists, then you're still processing, so tell them to continue waiting. Maybe have some upper bound to wait, where if they come back and the file exists, but it's been, say, 5 hours, delete the file and let them try again.

Doug Kavendek
  • 3,624
  • 4
  • 31
  • 43
  • This probably won't scale well as it doesn't allow for queueing of jobs. Also upperbounds aren't really what you want as sometimes it could take a long time (under heavy load). You probably should use a .pid file if you want to go w/ this approach. – Kendall Hopkins Mar 25 '11 at 18:44
  • Sounds reasonable, I wasn't really thinking in larger scales. I set up something like what I mentioned, but it was for jobs that would take at most 3-5 seconds. – Doug Kavendek Mar 25 '11 at 18:46