I have a table in my PostgreSQL (actually its mulitple tables, but for the sake of simplicity lets assume its just one) and multiple clients that periodically need to query the table to find changed items. These are updated or inserted items (deleted items are handled by first marking them for deletion and then actually deleting them after a grace period).
Now the obvious solution would be to just keep a “modified” timestamp column for each row, remember it for each each client and then simply fetch the changed ones
SELECT * FROM the_table WHERE modified > saved_modified_timestamp;
The modified column would then be kept up to date using triggers like:
CREATE FUNCTION update_timestamp()
RETURNS trigger
LANGUAGE ‘plpgsql’
AS $$
BEGIN
NEW.modified = NOW();
RETURN NEW;
END;
$$;
CREATE TRIGGER update_timestamp_update
BEFORE UPDATE ON the_table
FOR EACH ROW EXECUTE PROCEDURE update_timestamp();
CREATE TRIGGER update_timestamp_insert
BEFORE INSERT ON the_table
FOR EACH ROW EXECUTE PROCEDURE update_timestamp();
The obvious problem here is that NOW()
is the time the transation started. So it might happen that a transaction is not yet commited while fetching the updated rows and when its commited, the timestamp is lower than the saved_modified_timestamp, so the update is never registered.
I think I found a solution that would work and I wanted to see if you can find any flaws with this approach.
The basic idea is to use xmin (or rather txid_current()
) instead of the timestamp and then when fetching the changes, wrap them in an explicit transaction with REPEATABLE READ
and read txid_snapshot()
(or rather the three values it contains txid_snapshot_xmin()
, txid_snapshot_xmax()
, txid_snapshot_xip()
) from the transaction.
If I read the postgres documentation correctly, then all changes made transactions that are < txid_snapshot_xmax()
and not in txid_snapshot_xip()
should be returned in that fetch transaction. This information should then be all that is required to get all the update rows when fetching again. The select would then look like this, with xmin_version
replacing the modified
column:
SELECT * FROM the_table WHERE
xmin_version >= last_fetch_txid_snapshot_xmax OR xmin_version IN last_fetch_txid_snapshot_xip;
The triggers would then be simply like this:
CREATE FUNCTION update_xmin_version()
RETURNS trigger
LANGUAGE ‘plpgsql’
AS $$
BEGIN
NEW.xmin_version = txid_current();
RETURN NEW;
END;
$$;
CREATE TRIGGER update_timestamp_update
BEFORE UPDATE ON the_table
FOR EACH ROW EXECUTE PROCEDURE update_xmin_version();
CREATE TRIGGER update_timestamp_update_insert
BEFORE INSERT ON the_table
FOR EACH ROW EXECUTE PROCEDURE update_xmin_version();
Would this work? Or am I missing something?