0

We have a system with PostgreSQL 12.x where all changes are being written to master database server and two read-only streaming async replicas are used to reduce load from the master server for read-only transactions that can deal with slight delay.

Because the async replica may be delayed from the master in some cases we need a method to query the latency (delay) for the replication. We do not want to contact the master server to do this so one obvious way to do this is to query delay from the replica server:

select
(extract(epoch from now()) - extract(epoch from last_msg_send_time)) * 1000 as delay_ms
from pg_stat_wal_receiver;

However, it seems that pg_stat_wal_receiver has no data for our slave machines. It does have one row but only the pid column has data and every other column is empty. The documentation is unclear about the details but may it be that pg_stat_wal_receiver has data only for sync streaming replica?

Is there a way to figure out streaming delay of async replica? I'm hoping this is just some kind of configuration error instead of "this is not supported".

All the server machines are running PostgreSQL 12.2 but the client machines are still running PostgreSQL 9.5 client library in case it makes a difference.

Mikko Rantalainen
  • 14,132
  • 10
  • 74
  • 112
  • I believe this question belongs to stackoverflow instead dba.stackexchange.com because the info about delay is needed by client application to allow client application to automatically switch to another slave or master database connection in case delay is too high. – Mikko Rantalainen Jun 25 '20 at 14:25

2 Answers2

1

I think I can answer the question about the missing columns of pg_stat_wal_receiver. To read the rest of columns, you need to login as superuser or a login role being granted the pg_read_all_stats privilege/role.

This behavior is documented in the source code of walreceiver.c, in the implementation of pg_stat_get_wal_receiver, says:

...
/*
 * Only superusers and members of pg_read_all_stats can see details.
 * Other users only get the pid value to know whether it is a WAL
 * receiver, but no details.
 */
...
ItterK
  • 26
  • 1
  • Great find! Now I just need to figure out how to grant this access to selected normal users so the delay can be computed without superuser access. – Mikko Rantalainen Aug 29 '20 at 11:34
  • This works great in Ubuntu: `echo "select (extract(epoch from now()) - extract(epoch from last_msg_send_time)) * 1000 as delay_ms from pg_stat_wal_receiver;" | sudo -u postgres psql "dbname=mydb1"` – Mikko Rantalainen Aug 29 '20 at 11:36
  • I'm glad I can help :) . – ItterK Aug 29 '20 at 14:29
0

I don't understand why the table pg_stat_wal_receiver does not have data, but here's a workaround for the missing latency data:

select now() - pg_last_xact_replay_timestamp() as replication_lag;

or if you want the lag as milliseconds (plain number):

select round(extract(epoch from (now() - pg_last_xact_replay_timestamp())*1000)) as replication_lag_ms;

Note that this uses function pg_last_xact_replay_timestamp() (emphasis mine):

Get time stamp of last transaction replayed during recovery. This is the time at which the commit or abort WAL record for that transaction was generated on the primary. If no transactions have been replayed during recovery, this function returns NULL. Otherwise, if recovery is still in progress this will increase monotonically. If recovery has completed then this value will remain static at the value of the last transaction applied during that recovery. When the server has been started normally without recovery the function returns NULL.

However, it seems that async streaming replication does increment this timestamp continuously when system has normal load (active writing on master). It's still unclear if this timestamp stops increasing if master has no changes but the streaming replication is active.

Mikko Rantalainen
  • 14,132
  • 10
  • 74
  • 112