I'm building an application based around a task queue: it serves a series of tasks to multiple, asynchronously connected clients. The twist is that the tasks must be served in a random order.
My problem is that the algorithm I'm using now is computationally expensive, because it relies on many large queries and transfers from the database. I have a strong hunch that there's a cheaper way to achieve the same result, but I can't quite see the solution. Can you think of a clever fix for this problem?
Here's the (computationally expensive) algorithm I'm using now:
When the client queries for a new task...
- Query the database for "unfinished" tasks
- Put all tasks in a list
- Shuffle the list (using random.shuffle)
- Flag the first task as "in progress"
- Send the task parameters to the client for completion
When the client finishes the task...
6a. Record the result and flag the task as "finished."
If the client fails to finish the task by some deadline...
6b. Re-flag the task as "unfinished."
Seems like we could do better by replacing steps 1, 2, and 3, with pseudorandom sequences or hash functions. But I can't quite figure out the whole solution. Ideas?
Other considerations:
- In case it's important, I'm using python and mongodb for all of this. (Mongodb doesn't have some clever "use find_one to efficiently return a random matching entry" usage, does it?)
- The term "queue" is a little misleading. All the tasks are stored in subfields of a single collection within the mongodb. The length (total number of tasks) in the collection is known and fixed at the outset.
- If it's necessary, it might be okay to let the same task be assigned multiple times, as long as the occurrence is rare. But instances of this kind would need to be very rare, because completing each task is costly.
- I have identifying information on each client, so we know exactly who originates each task request.