3

I need to process files which get uploaded and it can take as little as 1 second or as much as 10 minutes. Currently my solution is to make a quartz job with a timer of 30 seconds and then process and arbitrary job whenever it hits. There are several problems with this.

One: if the job will take less than a few seconds it is wasteful to make things wait 30 seconds for the job queue.

Two: if there is only one long job in the queue it could feasibly try to do it twice.

What I want is a timeless queue. When things are added the are started immediately if there is a free worker. Is there a solution for this? I was looking at jesque, but I couldn't tell if it can do this.

Mikey
  • 4,692
  • 10
  • 45
  • 73
  • Jesque can definitely do this. This is the basic idea behind a message queue (as opposed to a job scheduler like Quartz, which really isn't a queue) – cdeszaq Mar 01 '12 at 22:44

2 Answers2

2

What you are looking for is a basic message queue. There are lots of options out there, but my favorite for Grails is RabbitMQ. The Grails plugin for it is quite good and it performs well in my experience.

In general, message queues allow you to have N producers (things creating jobs") adding work messages to a queue and then M consumers pulling jobs off of the queue and processing them. When a worker completes it's job, it simply asks the queue for the next job to process and if there is none, it just waits for the queue to give it something to do. The queue also keeps track of success / failure of message processing (you can control this) so that you don't give the same message to more than one worker.

This has the advantage of not relying on polling (so you can start processing as soon as things come in) and it's also much more scaleable. You can scale both your producers and consumers up or down as needed, decoupling the inputs from the outputs so that you can take a traffic spike and then work your way through it as you have the resources (workers) available.

cdeszaq
  • 30,869
  • 25
  • 117
  • 173
-1

To solve problem one just make the job check for new uploaded files every 5 seconds (or 3 seconds, or 1 second). If the check for uploaded files is quick then there is no reason you can't run it often.

For problem two you just need to record when you start processing a file to ensure it doesn't get picked-up twice. You could create a table in the database, or store the information in memory somewhere.

David
  • 1,940
  • 3
  • 17
  • 30
  • Polling doesn't scale particularly well, and adding a database into the loop just creates additional resource contention. Especially as an application moves towards more real-time processing, polling breaks down very quickly. – cdeszaq Mar 01 '12 at 22:42
  • My problem with this is that I have to do all sorts of checks against that second flag to solve for the case where processing hangs. – Mikey Mar 01 '12 at 23:20
  • @Mikey - No worries. But unless I am missing something you will still need to have code to handle jobs that hang even if you are using a queue to schedule them. – David Mar 02 '12 at 01:13
  • @Mikey, Most message queues have configurable ack/nack timeout policies that will automatically put a message back into the queue if it doesn't get an indication that the message was successfully processed soon enough. There's also the case of bad messages, but that's different from a hung message. – cdeszaq Mar 02 '12 at 14:34