I am coming again with another Postgres question. We are using the Managed Service from Azure that uses autovacuum
. Both vacuum
and statistics
are automatic.
The problem I am getting is that for a specific query, when it is running at specific hours, the plan is not good. I realized that after collecting statistics manually, the plan behaves better back again.
From the documentation of Azure I got the following:
The vacuum process reads physical pages and checks for dead tuples. Every page in shared_buffers is considered to have a cost of 1 (vacuum_cost_page_hit). All other pages are considered to have a cost of 20 (vacuum_cost_page_dirty), if dead tuples exist, or 10 (vacuum_cost_page_miss), if no dead tuples exist. The vacuum operation stops when the process exceeds the autovacuum_vacuum_cost_limit.
After the limit is reached, the process sleeps for the duration specified by the autovacuum_vacuum_cost_delay parameter before it starts again. If the limit isn't reached, autovacuum starts after the value specified by the autovacuum_naptime parameter.
In summary, the autovacuum_vacuum_cost_delay and autovacuum_vacuum_cost_limit parameters control how much data cleanup is allowed per unit of time. Note that the default values are too low for most pricing tiers. The optimal values for these parameters are pricing tier-dependent and should be configured accordingly.
The autovacuum_max_workers parameter determines the maximum number of autovacuum processes that can run simultaneously.
With PostgreSQL, you can set these parameters at the table level or instance level. Today, you can set these parameters at the table level only in Azure Database for PostgreSQL.
Let's imagine that I want to stress the default values I have for specific tables, as currently all of them are using the default ones for the whole database.
Keeping in mind that, I could try to use ( where X
is what I don't know )
ALTER TABLE tablename SET (autovacuum_vacuum_threshold = X );
ALTER TABLE tablename SET (autovacuum_vacuum_scale_factor = X);
ALTER TABLE tablename SET (autovacuum_vacuum_cost_limit = X );
ALTER TABLE tablename SET (autovacuum_vacuum_cost_delay = X );
Currently I have these values in pg_stat_all_tables
SELECT schemaname,relname,n_tup_ins,n_tup_upd,n_tup_del,last_analyze,last_autovacuum,last_autoanalyze,analyze_count,autoanalyze_count
FROM pg_stat_all_tables where schemaname = 'swp_am_hcbe_pro'
and relname in ( 'submissions','applications' )
"swp_am_hcbe_pro" "applications" "264615" "11688533" "18278" "2021-11-11 08:45:45.878654+00" "2021-11-11 13:50:27.498745+00" "2021-11-10 12:02:04.690082+00" "1" "152"
"swp_am_hcbe_pro" "submissions" "663107" "687757" "51603" "2021-11-11 08:46:48.054731+00" "2021-11-07 04:41:30.205468+00" "2021-11-04 15:25:45.758618+00" "1" "20"
Those two tables are by far the ones getting most of the DML activity.
Questions
- How can I determine which values for those specific parameters of the auto_vacuum are the best for tables with huge dml activity ?
- How can I force Postgres to run more times the automatic analyze for these tables that I can get more up-to-date statistics ? According to the documentation
autovacuum_analyze_threshold
Specifies the minimum number of inserted, updated or deleted tuples needed to trigger an ANALYZE in any one table. The default is 50 tuples. This parameter can only be set in the postgresql.conf file or on the server command line; but the setting can be overridden for individual tables by changing table storage parameters.
Does it mean that either deletes, updates or inserts gets to 50 triggers an auto analyze ? Because I am not seeing this behaviour.
If I change the values for the tables, should I do the same for their indexes ? Is there any option like cascade or similar that changing the table makes the values also affect the corresponding indexes ?
Thank you in advance for any advice. If you need any further details, let me know.