I strongly recommend that you look into tools that have already solved this problem, like PGQ. Queuing is way harder than you expect. This is not a wheel you want to reinvent.
Concurrency is hard
Mihai's answer looks superficially fine, but falls down somewhat in concurrent operation.
Two concurrent UPDATEs can pick the same row (which in his example has used_flag = FALSE
). One of them will get the lock and proceed. The other will wait until the 1st runs and commits. When the commit occurs the second update will then get the lock, re-check its condition, find no rows match any more, and do nothing. Thus, it's possible - in fact highly likely - for all but one of a set of concurrent updates to return an empty set.
In READ COMMITTED
mode you can still get decent results, roughly equivalent to a single session looping over the UPDATE
continuously. In SERIALIZABLE
mode it'll fail hopelessly. Try it; here's the setup:
CREATE TABLE paths (
used_flag boolean not null default 'f',
when_entered timestamptz not null default current_timestamp,
data text not null
);
INSERT INTO paths (data) VALUES
('aa'),('bb'),('cc'),('dd');
and here's the demo. Try with three concurrent sessions, following along step by step. Do it once in READ COMMITTED then again with all sessions SERIALIZABLE
by using BEGIN ISOLATION LEVEL SERIALIZABLE
instead of plain BEGIN
. Compare the results.
SESSION 1 SESSION2 SESSION 3
BEGIN;
BEGIN;
UPDATE paths
SET used_flag = TRUE
WHERE used_flag = FALSE
RETURNING data;
BEGIN;
INSERT INTO
paths(data)
VALUES
('ee'),('ff');
COMMIT;
UPDATE paths
SET used_flag = TRUE
WHERE used_flag = FALSE
RETURNING data;
BEGIN;
INSERT INTO
paths(data)
VALUES
('gg'),('hh');
COMMIT;
COMMIT;
In READ COMMITTED
the first UPDATE succeeds and produces four rows. The second produces the remaining two ee
and ff
that were inserted and committed after the first update ran. gg
and hh
are not returned by the second update even though it actually executes after they're committed, because it's already selected its rows and is waiting on a lock by the time they're inserted.
In SERIALIZABLE
isolation the 1st UPDATE succeeds and produces four rows. The second fails with ERROR: could not serialize access due to concurrent update
. In this case SERIALIZABLE
isolation won't help you out, it'll just change the nature of the failure.
Without the explicit transactions the same thing will happen when UPDATE
s run concurrently. It's just easier to demo without fiddling with timing if you use explicit transactions.
What about picking just one row?
As above the system works OK-ish, but what if you want to only get the oldest row? Because UPDATE
picks the row(s) it's going to operate on before blocking on a lock, you'll find that in any given set of transactions only one UPDATE
will return a result.
You'll be thinking of tricks like:
UPDATE paths
SET used_flag = TRUE
WHERE entry_id = (
SELECT entry_id
FROM paths
WHERE used_flag = FALSE
ORDER BY when_entered
LIMIT 1
)
AND used_flag = FALSE
RETURNING data;
or
UPDATE paths
SET used_flag = TRUE
WHERE entry_id = (
SELECT min(entry_id)
FROM paths
WHERE used_flag = FALSE
)
AND used_flag = FALSE
RETURNING data;
but these won't work how you expect; when run concurrently both will pick the same target row. One will proceed, one will block on the lock till the first commits, then proceed and return an empty result. Without the second AND used_flag = FALSE
I think they can even return duplicates! Try it after adding an entry_id SERIAL PRIMARY KEY
column to the demo paths
table above. To get them to race, just LOCK TABLE paths
in a 3rd session; see the examples I give in the following links.
I wrote about these issues in another answer and in my answer to can multiple threads cause duplicate updates on a constrained set.
Seriously, go check out PGQ. It's solved this for you already.