14

Is there a way to generate some kind of in-order identifier for a table records?

Suppose that we have two threads doing queries:

Thread 1:

begin;
insert into table1(id, value) values (nextval('table1_seq'), 'hello');
commit;

Thread 2:

begin;
insert into table1(id, value) values (nextval('table1_seq'), 'world');
commit;

It's entirely possible (depending on timing) that an external observer would see the (2, 'world') record appear before the (1, 'hello').

That's fine, but I want a way to get all the records in the 'table1' that appeared since the last time the external observer checked it.

So, is there any way to get the records in the order they were inserted? Maybe OIDs can help?

Cyberax
  • 1,667
  • 16
  • 18

4 Answers4

6

No. Since there is no natural order of rows in a database table, all you have to work with is the values in your table.

Well, there are the Postgres specific system columns cmin and ctid you could abuse to some degree.

The tuple ID (ctid) contains the file block number and position in the block for the row. So this represents the current physical ordering on disk. Later additions will have a bigger ctid, normally. Your SELECT statement could look like this

SELECT *, ctid   -- save ctid from last row in last_ctid
FROM   tbl
WHERE  ctid > last_ctid
ORDER  BY ctid

ctid has the data type tid. Example: '(0,9)'::tid

However it is not stable as long-term identifier, since VACUUM or any concurrent UPDATE or some other operations can change the physical location of a tuple at any time. For the duration of a transaction it is stable, though. And if you are just inserting and nothing else, it should work locally for your purpose.

I would add a timestamp column with default now() in addition to the serial column ...

I would also let a column default populate your id column (a serial or IDENTITY column). That retrieves the number from the sequence at a later stage than explicitly fetching and then inserting it, thereby minimizing (but not eliminating) the window for a race condition - the chance that a lower id would be inserted at a later time. Detailed instructions:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • I don't think relying on physical row location is a valid strategy here. Aborted transaction, lost database connection, etc. all can make a later row to be placed in wrong order. Also using a serial column isn't sufficient, as there can be any delay between inserting a row and committing a thread1 transaction, which can make thread2 transaction be committed and seen before it. – Tometzky Jul 07 '13 at 15:40
  • @Tometzky: I agree this is not reliable. For even more reasons than I already mentioned. It only excludes *some* reasons for out-of order tuples (like getting ids from the sequence separately), but not others. – Erwin Brandstetter Jul 08 '13 at 18:07
  • I tried to abuse CTID but it didn't work as I also want to delete some rows from time to time. Would be nice to have an automatic monotonically increasing counter, but I guess I'll have to do explicit locking. – Cyberax Jul 09 '13 at 22:01
  • If table is append-only without any `UPDATE` or `DELETE` and `AUTO_VACUUM` is disabled, then is `ctid` monotonically increasing by each insert? – Majid Azimi Jul 25 '17 at 21:08
  • 2
    @MajidAzimi: It's the default behavior that inserted rows are appended to the physical end of the table. But it's an implementation detail and there are not guarantees. And once you have concurrent transactions rows may appear to be out of order even then. – Erwin Brandstetter Jul 26 '17 at 02:44
6

What you want is to force transactions to commit (making their inserts visible) in the same order that they did the inserts. As far as other clients are concerned the inserts haven't happened until they're committed, since they might roll back and vanish.

This is true even if you don't wrap the inserts in an explicit begin / commit. Transaction commit, even if done implicitly, still doesn't necessarily run in the same order that the row its self was inserted. It's subject to operating system CPU scheduler ordering decisions, etc.

Even if PostgreSQL supported dirty reads this would still be true. Just because you start three inserts in a given order doesn't mean they'll finish in that order.

There is no easy or reliable way to do what you seem to want that will preserve concurrency. You'll need to do your inserts in order on a single worker - or use table locking as Tometzky suggests, which has basically the same effect since only one of your insert threads can be doing anything at any given time.

You can use advisory locking, but the effect is the same.

Using a timestamp won't help, since you don't know if for any two timestamps there's a row with a timestamp between the two that hasn't yet been committed.

You can't rely on an identity column where you read rows only up to the first "gap" because gaps are normal in system-generated columns due to rollbacks.

I think you should step back and look at why you have this requirement and, given this requirement, why you're using individual concurrent inserts.

Maybe you'll be better off doing small-block batched inserts from a single session?

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • 1
    I'm facing a similar challenge and my reason to do that is to expose a feed of events. Each row represent and event. The contract of the feed should be that pages are immutable. Clients will the request pages as `page?after=&limit=XX`. However inserts out of order can cause `page?after=3` to return `[5]` the first time and `[4,5]` after "4" was committed. It's an insert only table so it seemed a good base for the feed. But with serial numbers appearing out of order I'm pondering if there is a better way.We're currently looking back a number of pages to check for new entries. – acorello Sep 27 '17 at 10:20
  • 1
    This is exactly what logical decoding is for. Look into wal2json, pglogical, etc. – Craig Ringer Sep 28 '17 at 00:12
3

We found another solution with recent PostgreSQL servers, similar to @erwin's answer but with txid.

When inserting rows, instead of using a sequence, insert txid_current() as row id. This ID is monotonically increasing on each new transaction.

Then, when selecting rows from the table, add to the WHERE clause id < txid_snapshot_xmin(txid_current_snapshot()).

txid_snapshot_xmin(txid_current_snapshot()) corresponds to the transaction index of the oldest still-open transaction. Thus, if row 20 is committed before row 19, it will be filtered out because transaction 19 will still be open. When the transaction 19 is committed, both rows 19 and 20 will become visible.

When no transaction is opened, the snapshot xmin will be the transaction id of the currently running SELECT statement.

The returned transaction IDs are 64-bits, the higher 32 bits are an epoch and the lower 32 bits are the actual ID.

Here is the documentation of these functions: https://www.postgresql.org/docs/9.6/static/functions-info.html#FUNCTIONS-TXID-SNAPSHOT

Credits to tux3 for the idea.

Philippe
  • 1,287
  • 11
  • 23
  • Interesting. So you will get records in the order of transaction start time and will not miss anything. I need to insert more then a single row per transaction sometimes but I guess I can just use txid_current()*1000, +1, +2, +3,...Have you used that in production? Any issues with it? – bodrin Jun 13 '20 at 18:13
  • Instead of doing the `*1000 +1, +2...` trick, maybe you can just add a non-unique column and insert `txid_current()` there instead of using the `id` column of your table. You'll have multiple rows with the same txid, but the WHERE clause should work just as good. We ended up not using this solution at all and avoided the problem by changing completely the way we inserted and requested records from our API, so this has never reached production. – Philippe Jun 14 '20 at 16:59
  • @Philippe any blog posts about your solution? :) – Bruno Medeiros Dec 07 '20 at 05:24
  • 1
    Nice idea. The danger is that someone may accidently leave a write transaction running. This will cause the query to never to 'see' any new data until the transaction is committed. – richard Mar 26 '23 at 23:04
2

If you mean that every query if it sees world row it has to also see hello row then you'd need to do:

begin;
lock table table1 in share update exclusive mode;
insert into table1(id, value) values (nextval('table1_seq'), 'hello');
commit;

This share update exclusive mode is the weakest lock mode which is self-exclusive — only one session can hold it at a time.

Be aware that this will not make this sequence gap-less — this is a different issue.

Tometzky
  • 22,573
  • 5
  • 59
  • 73