9

We have a postgres database in Amazon RDS. Initially, we needed to load large amount of data quickly, so autovacuum was turned off according to the best practice suggestion from Amazon. Recently I noticed some performance issue when running queries. Then I realized it has not been vacuumed for a long time. As it turns out many tables have lots of dead tuples.

enter image description here

Surprisingly, even after I manually ran vacuum commands on some of the tables, it did not seem to remove these dead tuples at all. vacuum full takes too long to finish which usually ends up timed out after a whole night.

Why does vacuum command not work? What are my other options, restart the instance?

ddd
  • 4,665
  • 14
  • 69
  • 125
  • There's always the brute force option of 1) creating a new table, 2) copying data to it from old table, 3) delete old table, 4) rename new table to old name. Repeated for every table that has a lot of dead rows. – Bohemian Dec 21 '17 at 06:10
  • I think amazon suggests to disable autovacuum for the time of database restore only... In general disable autovacuum only brings problems. it even says in your link: "Important Not running autovacuum can result in an eventual required outage to perform a much more intrusive vacuum operation... " and so on – Vao Tsun Dec 21 '17 at 06:45
  • 1
    if `vacuum full` times out, set statement_timeout to zero – Vao Tsun Dec 21 '17 at 06:49

2 Answers2

19

Use VACUUM (VERBOSE) to get detailed statistics of what it is doing and why.

There are three reasons why dead tuples cannot be removed:

  1. There is a long running transaction that has not been closed. You can find the bad boys with

    SELECT pid, datname, usename, state, backend_xmin
    FROM pg_stat_activity
    WHERE backend_xmin IS NOT NULL
    ORDER BY age(backend_xmin) DESC;
    

    You can get rid of a transaction with pg_cancel_backend() or pg_terminate_backend().

  2. There are prepared transactions which have not been commited. You can find them with

    SELECT gid, prepared, owner, database, transaction
    FROM pg_prepared_xacts
    ORDER BY age(transaction) DESC;
    

    User COMMIT PREPARED or ROLLBACK PREPARED to close them.

  3. There are replication slots which are not used. Find them with

    SELECT slot_name, slot_type, database, xmin
    FROM pg_replication_slots
    ORDER BY age(xmin) DESC;
    

    Use pg_drop_replication_slot() to delete an unused replication slot.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • I was able to delete all the dead tuples after killing the transactions. Why do I have so many live tuples though? Do they do anything? – ddd Dec 21 '17 at 14:39
  • Yes, they are the data in your tables. After a `VACUUM` the number of rows in a table should be about as many as the number of live tuples. – Laurenz Albe Dec 21 '17 at 14:47
  • All these queries result 0 rows except for 1 which returns only the query itself that pulls this info about pg_stat_activity. – Nae Jul 27 '22 at 10:44
1

https://dba.stackexchange.com/a/77587/30035 explains why not all dead tuples are removed.

for vacuum full not to time out, set statement_timeout = 0

http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_BestPractices.html#CHAP_BestPractices.PostgreSQL recommends disabling autovacuum for the time of database restore, further they definetely recommend using it:

Important

Not running autovacuum can result in an eventual required outage to perform a much more intrusive vacuum operation.

Canceling all sessions and vacuuming table should help with previous dead tuples (regarding your suggestion to restart cluster). But what I suggest you to do in first place - switch autovacuum on. And better probably control vacuum on table, not on the whole cluster with autovacuum_vacuum_threshold, (ALTER TABLE) reference here: https://www.postgresql.org/docs/current/static/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS

Vao Tsun
  • 47,234
  • 13
  • 100
  • 132