6

We had an ugly problem, by mistake, a balancer redirect some requests to a test instance with pretty similar data than production, now I know that there are data recorded in the test Postgres that belongs to production

Is there a way to list all the tables with data changes in the last 24 hours in Postgres?

Postgres version is 9.3 and I have around 250 tables.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
martinkenneth
  • 142
  • 1
  • 10
  • 4
    *Before* you do anything else: shut down the test DB cluster immediately and make a file-level copy of the whole DB data directory. Internal data and / or current physical order of rows may hold valuable clues - until you do something else to the tables (including `VACUUM`). After starting the DB cluster again, also draw a plain backup of the test DB. – Erwin Brandstetter Sep 10 '15 at 22:28

1 Answers1

8

First consider my comment.

Postgres up to and including 9.4 does not by itself record timestamps when rows were inserted or updated.

There are some system columns in the row headers that can help in the forensic work. The physical order of rows (ctid) can be an indicator if nothing else has happened to the table since. In simple cases new rows are appended to the physical end of a table when inserted, so the ctid indicates what was inserted last - until anything changes in the table. Postgres is free to rearrange the physical order of rows at will, for instance with VACUUM. Any UPDATE also writes a new row version, which can change the physical position. The new version does not have to be at the end of the table. Postgres tries to keep new row version on the same data page if possible (HOT update) ...

That said, here is a simple query to get the physically last rows for a given table:

SELECT ctid, *
FROM   big
ORDER  BY ctid DESC
LIMIT  10;

Related answers on dba.SE with detailed information:

The insert transaction id xmin can be useful:

If you happen to have a backup for the test DB from right before the incident, that would be helpful. Restore the old state to a separate schema of the test DB and compare tables ...

Typically, I add one or two timestamptz columns to important tables for when the row was inserted, and / or when it was updated the last time. That would be tremendously useful for you right now ...

What would also be great for you: the "temporal" features introduced in the SQL standard with SQL:2011. But that's not implemented in Postgres, yet.
There's a page in the Postgres Wiki.
There is also an unofficial extension on PGXN. I have not tested it and can't say how far it is.

Postgres 9.5 introduces a feature to record commit timestamps (like @Craig commented). Needs to be enabled manually before it starts recording. The manual:

track_commit_timestamp (bool)

Record commit time of transactions. This parameter can only be set in postgresql.conf file or on the server command line. The default value is off.

And some functions to work with it.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 1
    Commit timestamps, introduced in 9.5, would also help with this if enabled. – Craig Ringer Sep 10 '15 at 23:04
  • @Craig: Very interesting. Would be gold for the situation at hand. Details in the 9.5 manual: http://www.postgresql.org/docs/9.5/static/functions-info.html#FUNCTIONS-COMMIT-TIMESTAMP and http://www.postgresql.org/docs/9.5/static/runtime-config-replication.html#GUC-TRACK-COMMIT-TIMESTAMP I'll add that to my answer if you don't mind. – Erwin Brandstetter Sep 10 '15 at 23:15
  • 1
    Of course, go ahead. Note that commit timestamp collection must be enabled beforehand, though, and that commit timestamp information is not retained indefinitely. – Craig Ringer Sep 10 '15 at 23:17
  • Cleanup will be nasty if the test <--> production tables have separate surrogate keys, but no natural keys. Basically insolvable. – wildplasser Sep 10 '15 at 23:34
  • I have a clean backup of the test db, ctid looks a good option, could you share a sample query please? I guess that will be posible make something like this mysql answer http://stackoverflow.com/questions/2492040/query-to-find-tables-modified-in-the-last-hour – martinkenneth Sep 10 '15 at 23:57
  • 1
    @martinkenneth: I added a sample query and some links. – Erwin Brandstetter Sep 11 '15 at 00:22
  • 1
    @martinkenneth: BTW, the [question you linked](http://stackoverflow.com/questions/2492040/query-to-find-tables-modified-in-the-last-hour) is for MySQL and answers do not work for Postgres at all. – Erwin Brandstetter Sep 11 '15 at 01:24
  • @ErwinBrandstetter yeah, I know, I was guessing that something similar was available in postgres, sadly no – martinkenneth Sep 11 '15 at 19:33