3

We have a few different tasks (eg. process image, process video) that are created when a user uploads media. The concept we have at the moment is to have a primary Task that is the container for all task types, and a Subtask which has all the metadata required to process the task.

I've seen countless cases where the answer is to use Redis, but we would like to keep a task history to calculate things like average task time, and the metadata of subtasks can be complex.

I'm not very experienced with PostgreSQL so see this as pseudocode:

BEGIN TRANSACTION;

-- Find an unclaimed task.
-- Claim the task.
-- Prevent another worker from also claiming this task.
UPDATE (
    SELECT FROM subtasks
    INNER JOIN tasks
    ON tasks.id = subtasks.id
    WHERE tasks.started_at = NULL   -- Not claimed
    ORDER BY tasks.created_at ASC   -- First in, first out 
    LIMIT 1
    FOR UPDATE SKIP LOCKED  -- Don't wait for it, keep looking.
)
SET tasks.started_at = now()
RETURNING *

-- Use metadata from the subtask to perform the task.
-- If an error occurs we can roll back, unlocking the row.
-- Will this also roll back if the worker dies?

-- Mark the task as complete.
UPDATE tasks
SET completed_at = now()
WHERE tasks.id = $id

END TRANSACTION;

Will this work?


Edit 1: Using clock_timestamp() and no subselect.

BEGIN TRANSACTION

    -- Find an unclaimed task.
    SELECT FROM subtasks
    INNER JOIN tasks
    ON tasks.id = subtasks.id
    WHERE tasks.started_at = NULL   -- Not claimed
    ORDER BY tasks.created_at ASC   -- First in, first out
    LIMIT 1                         -- We only want to select a single task.
    FOR UPDATE SKIP LOCKED          -- Don't wait for it, keep looking.

    -- Claim the task.
    UPDATE tasks
    SET started_at = clock_timestamp()
    WHERE id = $taskId


    -- Use metadata from the subtask to perform the task.
    -- If an error occurs we can roll back, unlocking the row.
    -- Will this also roll back if the worker dies?

    -- Task is complete.

    -- Mark the task as complete.
    UPDATE tasks
    SET completed_at = clock_timestamp()
    WHERE id = $taskId

END TRANSACTION 
rtheunissen
  • 7,347
  • 5
  • 34
  • 65
  • You need to use `clock_timestamp()` if you want to have actual time when query is executed inside transaction. `now()` will always have the same timestamp of transaction start. You can't `UPDATE` subselect. You can't do `SELECT FROM` – Łukasz Kamiński Apr 10 '17 at 10:03
  • http://stackoverflow.com/q/32946852/330315 or http://stackoverflow.com/q/22765054/330315 or https://brandur.org/postgres-queues –  Apr 10 '17 at 10:03
  • https://blog.2ndquadrant.com/what-is-select-skip-locked-for-in-postgresql-9-5/ – rtheunissen Apr 10 '17 at 10:42

0 Answers0