29

I'd like to move some data from one table to another (with a possibly different schema). Straightforward solution that comes into mind is -

start a transaction with serializable isolation level;
INSERT INTO dest_table SELECT data FROM orig_table,other-tables WHERE <condition>;
DELETE FROM orig_table USING other-tables WHERE <condition>;
COMMIT;

Now what if the amount of data is rather big, and the <condition> is expensive to compute? In PostgreSQL, a RULE or a stored procedure can be used to delete data on the fly, evaluating condition only once. Which solution is better? Are there other options?

j_random_hacker
  • 50,331
  • 10
  • 105
  • 169
IggShaman
  • 531
  • 1
  • 4
  • 6

4 Answers4

73

[Expanding on dvv's answer]

You can move to an existing table as follows. For unmatched schema, you should specify columns.

WITH moved_rows AS (
    DELETE FROM <original_table> a
    USING <other_table> b
    WHERE <condition>
    RETURNING a.* -- or specify columns
)
INSERT INTO <existing_table> --specify columns if necessary
SELECT [DISTINCT] * FROM moved_rows;

But you want to move the data into a new table (not an existing one), the outer syntax is different:

CREATE TABLE <new_table> AS
WITH moved_rows AS (
    DELETE FROM <original_table> a
    USING <other_table> b
    WHERE <condition>
    RETURNING a.* -- or specify columns
)
SELECT [DISTINCT] * FROM moved_rows;

Note: "USING <other_table> b" will join a with b so that you can delete rows from a based on the join result. If you don't have any other_table to join, you can just omit this. See here for more info.

ADTC
  • 8,999
  • 5
  • 68
  • 93
  • 1
    I use pg10.1 and `CREATE TABLE xxx AS DELETE FROM yyy` results in a syntax error. The latter example here is a good workaround. – kbridge4096 May 10 '19 at 05:40
  • 4
    Is this operation atomic? Could something happen between `DELETE` and `INSERT`? – Andrey Semakin Jul 20 '20 at 15:56
  • @AndreySemakin A cursory search shows that single statements are not necessarily atomic - at least not in the way you'd think it is. [Here](https://www.google.com/search?q=is+one+very+complex+sql+statement+atomic+postgresql), [Here](https://stackoverflow.com/q/44292587), [Here](https://stackoverflow.com/q/32700837) and [Here](https://stackoverflow.com/q/21468742). You may want to wrap it in `TRANSACTION` if your application requires atomicity. I would suggest you to do more research and study into ACID properties. _(I'm no longer actively dealing with databases, so I can't go any further.)_ – ADTC Sep 25 '20 at 02:06
  • In creating table example code, 1) what should "other_table" be replaced with (if not the new table or the original table which are already specified by "new_table" and "original_table" respectively. 2) How do you specify the condition of the where statement as when the column "xyz" of the original table equals 1? (Using the column name as "WHERE xyz = 1" produces errors) – Andikan Otung Feb 11 '23 at 21:07
  • 1
    @AndikanOtung it's a second join table. If you don't have one, you can omit it. I have expanded my answer with more details and documentation link. Please check. For your question (2) try `a.xyz`. – ADTC Feb 13 '23 at 06:19
10

You can move data using SINGLE query in Postgres 9.1 See http://www.postgresql.org/docs/9.1/static/queries-with.html Section "Data-Modifying Statements in WITH"

mnencia
  • 3,298
  • 1
  • 23
  • 35
dvv
  • 559
  • 5
  • 6
7

If the condition is so complicated that you don't want to execute it twice (which BTW sounds unlikely to me, but anyway), one possibility would be to ALTER TABLE ... ADD COLUMN on the original table to add a boolean field, and run an UPDATE on the table to set that field to true WHERE <condition>. Then your INSERT and DELETE commands can simply check this column for their WHERE clauses.

Don't forget to delete the column from both source and destination tables afterwards!

Hmm, even less intrusive would be to create a new temporary table whose only purpose is to contain the PKs of records that you want to include. First INSERT to this table to "define" the set of rows to operate on, and then join with this table for the table-copying INSERT and DELETE. These joins will be fast since table PKs are indexed.


[EDIT] Scott Bailey's suggestion in the comments is obviously the right way to do this, wish I'd thought of it myself! Assuming all the original table's PK fields will be present in the destination table, there's no need for a temporary table -- just use the complex WHERE conditions to insert into the destination, then DELETE from the original table by joining to this table. I feel stupid for suggesting a separate table now! :)

j_random_hacker
  • 50,331
  • 10
  • 105
  • 169
  • The temp table gets my vote. Updating rows and then deleting them means creating a lot of garbage in the heap, as well as requiring touching the table schema (not that that really matters) – araqnid Jun 04 '10 at 13:15
  • 6
    You won't need the temp table or to do an expensive calc twice. Do the calculation once as you insert into new table. Then do a delete from old table where record is in new table. – Scott Bailey Jun 04 '10 at 23:16
  • Destination table will have plenty of data as well, so this DELETE statement is potentially big. Your idea is good, but I'm still looking for something faster. – IggShaman Jun 06 '10 at 20:20
  • @IggShaman: Although I wouldn't rule it out, I can't see how anything could be much faster, short of writing a C extension that somehow rewires the existing rows into the new table at the disk level (which is probably impossible anyway). BTW if your destination table has an index on it that includes all the PK fields of the source table, PostgreSQL will just read the index instead of the entire table. – j_random_hacker Jun 07 '10 at 02:05
-1

You might dump the table data to a file, then insert it to another table using COPY Usually COPY is faster than INSERT.

pcent
  • 1,929
  • 2
  • 14
  • 17
  • 1
    I've made some tests processing large amounts of data using triggers, row by row, and using a stored procedure with a single transaction. The stored procedure approach was faster. – pcent Jun 04 '10 at 12:45
  • You should also fine tune your PostgreSQL server to enhance the performance. Read: http://wiki.postgresql.org/wiki/Performance_Optimization – pcent Jun 04 '10 at 12:48
  • yah, I think that guideline should be qualified to say that one COPY is faster than a set of INSERT statements, one per row. INSERT...SELECT for copying data around I would think was optimal since the data isn't being passed outside the executor. – araqnid Jun 04 '10 at 13:12
  • 3
    Copy is going to be faster than insert FOR EXTERNAL DATA. OP is working with data already in the database so an insert is going to be faster than exporting then copying back. – Scott Bailey Jun 04 '10 at 23:13