PostgreSQL: copy the table rows and get the old-new matching information

Question

Here we have a certain table:

CREATE TABLE mytbl (
  id int PRIMARY KEY generated by default as identity,
  col1 int,
  col2 text, ...
);

We need to copy part of the rows of the table and get information about the correspondence of the old and new IDs (in order to then copy the related data from other tables).

I tried such a query:

insert into mytbl (col1, col2)
    select col1, col2
        from mytbl old
        where col1 = 999 -- some condition
    returning 
        old.id as old_id,
        id as new_id;

But this query produces the error: ERROR: missing FROM-clause entry for table "old"

Is there a way to write a similar query without an error?

What I've already thought about:

copy one line at a time in a loop — apparently I will have to, unless an easier way is found.
alter table mytbl add column old_id (foreign key references mytbl) and then insert into mytbl (old_id, col1, col2) select id, col1, col2 ... returning id as new_id, old_id — but in reality, we do not need to store this information, because it is needed only at the moment of creating a copy.
INSERT SELECT without RETURNING; and then SELECT old.id as old_id, new.id as new_id FROM mytbl new JOIN mytbl old ON (some connection condition by intelligent key) — is quite difficult in my case.

a) exploit the fact that `INSERT` returns the rows in the exact same order as the selected rows, and join them on `row_number()` or b) use the underlying sequence of the column to generate the mapping beforehand, then insert the new ids — Bergi, Jun 20 '23 at 22:51
@Bergi, as to a) read this thread [Returning order](https://www.postgresql.org/message-id/CAN19dycS0oUorLOGgjxMX8NZq-hOA1RJrzwC4L3FxpMf4BQbBQ%40mail.gmail.com) as to why that is not always the case. In particular https://www.postgresql.org/message-id/1678721.1681566044%40sss.pgh.pa.us: *We already rejected the idea that INSERT must preserve the order of the incoming tuples. Please don't re-propose it with different wording.* — Adrian Klaver, Jun 20 '23 at 22:58
1) Would `insert into mytbl (id, col1, col2) select id, col1, col2 ...` work or are there potential conflicts? 2) Instead of 1) or your foreign key suggestion, have unconstrained `old_id` column in `mytbl1`to hold `id` from `mytbl`. — Adrian Klaver, Jun 20 '23 at 23:02
@AdrianKlaver Interesting, [Erwin says](https://stackoverflow.com/a/29263402/1048572) "*[we rely] on the undocumented implementation detail that rows are inserted in the order provided. It works in all current versions of Postgres and is probably not going to break.*" — Bergi, Jun 21 '23 at 09:39
@Bergi. 1) The quote comes from someone who writes the code and knows more about it then just about anyone. I would take their word as close to the truth as you are going to get. 2) The fact that it works in given circumstances now is a happy side effect that may change at any time. Bottom line order is not guaranteed. — Adrian Klaver, Jun 21 '23 at 15:08
@AdrianKlaver Yes, I definitely trust the postgres core developer's stance. Before that, Erwin's answer was the best advice I could find on the topic, and had used his approach myself in production code (working fine so far…). I plan on switching to the second approach I showed in my answer below, what do you think? — Bergi, Jun 21 '23 at 15:54

score 1 · Accepted Answer · answered Jun 21 '23 at 10:09

You can (but probably shouldn't) rely on the order of inserted rows to match between the selection and the RETURNING clause:

WITH selection AS (
    SELECT id, col1, col2, row_number() OVER (ORDER BY id)
    FROM mytbl
    WHERE col1 = 999 -- some condition
    ORDER BY id
), inserted AS (
    INSERT INTO mytbl (col1, col2)
    SELECT col1, col2
    FROM selection
    ORDER BY selection.id
    RETURNING id
)
SELECT s.id AS old_id, ins.id AS new_id
FROM (SELECT inserted.id, row_number() OVER (ORDER BY inserted.id) FROM inserted) AS ins
JOIN selection USING (row_number);

A probably better approach (which also works nicely when you need to copy in multiple mutually-dependant tables) is to generate the new ids ahead of the insert, using the sequence that is underlying the identity column:

WITH selection AS (
    SELECT nextval(pg_get_serial_sequence('mytbl', 'id')) AS new_id, id AS old_id, col1, col2
    FROM mytbl
    WHERE col1 = 999 -- some condition
), inserted AS (
    INSERT INTO mytbl(id, col1, col2)
    OVERRIDING SYSTEM VALUE -- necessary when using identity columns!
    SELECT new_id, col1, col2
    FROM selection
)
SELECT old_id, new_id
FROM selection;

This does however require USAGE permission on the underlying sequence, which may not be granted by default even if a role can normally use the identity column.

Thank you Bergi. I was already thinking of taking the first advice, but was stopped by Adrian Klaver's warning. The second method is interesting, although it looks like a hack. As a result, I decided to implement copying in a loop. — Alexander Tolmachev, Jun 22 '23 at 17:40
@AlexanderTolmachev I suspect that would be much slower though — Bergi, Jun 22 '23 at 17:58

PostgreSQL: copy the table rows and get the old-new matching information

1 Answers1

Linked