1

I managed to delete 4,000 rows from a table in my 129,000-row production database (Postgres 9.4 on Heroku), but only identified the problem a few days later.

I have a backup from before the loss, but only want to selectively restore the missing rows back to the table, preserving their id's. (A complete restore is not an option as new data has since been added to the table.)

Into a local testing database I have imported the backed-up table as articles_backup, alongside the actual articles table. I want to find all the rows in articles_backups that are missing from articles and then copy these to a new table articles_restores that I will then restore to the production database, back into the articles table (preserving record id's).

This query successfully returns all the id's of the deleted records:

select articles_backups.id
from articles_backups
left outer join articles on (articles_backups.id = articles.id)
where articles.id is null

But I have not been able to copy the result to a new table. I have unsuccessfully tried:

select *
into articles_restores
from articles_backups
left outer join articles on (articles_backups.id = articles.id)
where articles.id is null;

Which gives:

ERROR:  column "id" specified more than once
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
djoll
  • 1,139
  • 1
  • 12
  • 31

2 Answers2

2

Basically your query with LEFT JOIN / IS NULL does what you are after:

You get the error because you select all columns from both tables, and there is an id column in both. It's not possible to create a new table with duplicate column names, and it's not what you want to begin with. Only select columns from articles_backups:

CREATE TABLE articles_restores AS
SELECT ab.*
FROM   articles_backups ab
LEFT   JOIN articles a USING (id)
WHERE  a.id IS NULL;

While being at it I simplified your query syntax with table aliases. The USING clause is just for the convenience of shorter code. It folds the two id columns into one, but all other columns are still in there twice if you SELECT *.

Use CREATE TABLE AS. SELECT INTO is also defined by the SQL standard and implemented in Postgres, but its use is discouraged. It's used in PL/pgSQL functions for a different purpose. Details:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
1

You could use an except to retrieve all the rows from articles_backup that are different from articles:

(assuming both tables have the same columns in the same order)

you could also create a temp table with this info to make it easy on your repairing statements:

create table temp_articles as
select * from articles_backup 
except
select * from articles

step 1 - update rows from 'articles_backup' present in articles.

This step needs attention... you will have to establish a rule to choose between the data present in articles and the one present in temp_articles.

UPDATE articles a
SET a.col1=b.col1,
    a.col2=b.col2,
    (... other columns ...)
FROM (SELECT * FROM  temp_articles) AS b
WHERE a.id = b.id and /* your rule for data to be (or not) updated goes here */

step 2 - insert rows from 'articles_backup' not present in articles (your deleted records):

insert into articles
select * from temp_articles where id not in (select id from articles)

Let us know if you need more help.

The Fabio
  • 5,369
  • 1
  • 25
  • 55
  • Thanks Fabio / great info there, which lead me to being able to restore the missing tables with a few tweaks. – djoll Sep 03 '15 at 05:32