4

The scenario:

TL;DR - I need a queue system for triggering jobs based on a future timestamp and NOT on the order it is inserted

I have a MySQL database of entries that detail particular events that need to be performed (which will consist mostly of a series of arithmetic calculations and a database insert/update) in a precise sequence based on timestamps. The time the entry is inserted and when the event will be "performed" has no correlation and is determined by outside factors. The table also contains a second column of milliseconds which increases the timing precision.

This table is part of a job "queue" which will contain entries set to execute from anywhere between a few seconds to a few days in the future, and can potentially have up to thousands of entries added every second. The queue needs to be parsed constantly (every second?) - perhaps by doing a select of all timestamps that have expired during this second and sorting by the milliseconds, and then executing each event detailed by the entries.

The problem

Currently the backend is completely written in PHP on an apache server with MySQL (ie standard LAMP architecture). Right now, the only way I can think of to achieve what I've specified is to write a custom PHP job queue script that will do the parsing and execution, looped every second using this method. There are no other job systems that I'm aware of which can queue jobs according to a specified timestamp/millisecond rather than the entry time.

This method however sounds rather infeasible CPU wise even on paper - I have to perform a huge MySQL query every second and execute some sort of function for each row retrieved, with the possibility of it running over a second of execution time which will start introducing delays to the parsing time and messing up the looping script.

I am of course attempting to create a solution that will be scalable should there be heavy traffic on the system, which this solution fails miserably as it will continue falling behind as the number of entries get larger.

The questions

I'd prefer to stick to the standard LAMP architecture, but is there any other technology I can integrate nicely into the stack that is better equipped to deal with what I'm attempting to do here?

Is there another method entirely to to accurately trigger events at a specified future date without the messy fiddling about with the constant queue checking?

If neither of the above options are suitable, is there a better way to loop the PHP script in the background? In the worst case scenario I can accept the long execution times and split the task up between multiple 'workers'.

Update

RabbitMQ was a good suggestion, but unfortunately doesn't execute the task as soon as it 'expires' - it has to go through a queue first and wait up on any tasks in front that have yet to expire. The expiry time has a wide range between a few seconds to a few days, and the queue needs to be sorted somehow each time a new event is added in so the expiry time is always in order in the queue. This isn't possible as far as I'm aware of in RabbitMQ, and doesn't sound very efficient either. Is there an alternative or a programmatic fix?

Community
  • 1
  • 1
user1334061
  • 43
  • 1
  • 5

1 Answers1

0

Sometimes, making a square peg fit into a round hole takes too much effort. While using MySQL to create queues can be effective, it gets much trickier to scale. I would suggest that this might be an opportunity for RabbitMQ.

Basically, you would setup a message queue that you can put the events into. You would then have a "fanout" architecture with your workers processing each queue. Each worker would listen to the queue and check to see if the particular event needs to be processed. I imagine that a combination of "Work Queues" and the "Routing" techniques available in Rabbit would achieve what you are looking for in a scalable and reliable way.

I would envision a system that works something like this:

  1. spawn workers to listen to queues, using routing keys to prune down how many messages they get
  2. each worker checks the messages to see if they are to be performed now
  3. if the message is to be performed, perform it and acknowledge -- otherwise, re-dispatch the message for future processing. There are some simple techniques available for this.

As you need more scale, you add more workers. RabbitMQ is extremely robust and easy to cluster as well when you eventually cap out your queue server. There are also other cloud-based queuing systems such as Iron.IO and StormMQ

Community
  • 1
  • 1
  • 1
    I did some research into RabbitMQ and it looked VERY promising until I came across this: http://www.rabbitmq.com/ttl.html (caveats). I need to set per message TTL to define the time when the task will be performed, but like I mentioned, it could range from seconds to days. I need the task to be executed as soon as the timer expires, but it looks like rabbitMQ will wait for the head of the queue to pop out first. I need some way to sort the queue so the tasks that will expire first are at the head of the queue and will pop out in order. Is there a solution to this? – user1334061 Apr 02 '14 at 01:25
  • Probably not, here's how I would try to handle it though: make the message itself have the execution time in it. The jobs that receive the messages could acknowledge immediately, read the time, if it's now or past, execute. If it's in the future, re-dispatch with the same time so that another job could pick it up. I would make all the workers sleep some small amount of time between grab and redispatch (1-100ms) so you don't busy-wait the queue server. –  Apr 06 '14 at 00:38