We have a few different tasks (eg. process image, process video) that are created when a user uploads media. The concept we have at the moment is to have a primary Task
that is the container for all task types, and a Subtask
which has all the metadata required to process the task.
I've seen countless cases where the answer is to use Redis, but we would like to keep a task history to calculate things like average task time, and the metadata of subtasks can be complex.
I'm not very experienced with PostgreSQL so see this as pseudocode:
BEGIN TRANSACTION;
-- Find an unclaimed task.
-- Claim the task.
-- Prevent another worker from also claiming this task.
UPDATE (
SELECT FROM subtasks
INNER JOIN tasks
ON tasks.id = subtasks.id
WHERE tasks.started_at = NULL -- Not claimed
ORDER BY tasks.created_at ASC -- First in, first out
LIMIT 1
FOR UPDATE SKIP LOCKED -- Don't wait for it, keep looking.
)
SET tasks.started_at = now()
RETURNING *
-- Use metadata from the subtask to perform the task.
-- If an error occurs we can roll back, unlocking the row.
-- Will this also roll back if the worker dies?
-- Mark the task as complete.
UPDATE tasks
SET completed_at = now()
WHERE tasks.id = $id
END TRANSACTION;
Will this work?
Edit 1: Using clock_timestamp()
and no subselect.
BEGIN TRANSACTION
-- Find an unclaimed task.
SELECT FROM subtasks
INNER JOIN tasks
ON tasks.id = subtasks.id
WHERE tasks.started_at = NULL -- Not claimed
ORDER BY tasks.created_at ASC -- First in, first out
LIMIT 1 -- We only want to select a single task.
FOR UPDATE SKIP LOCKED -- Don't wait for it, keep looking.
-- Claim the task.
UPDATE tasks
SET started_at = clock_timestamp()
WHERE id = $taskId
-- Use metadata from the subtask to perform the task.
-- If an error occurs we can roll back, unlocking the row.
-- Will this also roll back if the worker dies?
-- Task is complete.
-- Mark the task as complete.
UPDATE tasks
SET completed_at = clock_timestamp()
WHERE id = $taskId
END TRANSACTION