I using 3 RDS Postgres Instance (1 master, 2 slave, version 13). And my Postgres config is using the default parameter group.
max_standby_archive_delay = 30s
max_standby_streaming_delay = 30s
Recently, the following error occurs frequently.
ERROR: canceling statement due to conflict with recovery
Detail: User query might have needed to see row versions that must be removed
Reading the Postgres documentation, it is said that the above error may occur if cleanUp(Vacuum) occurs on the master while the query running on the slave is looking at the previous version snapshot.
However, I did not run the vacuum manually, and when I look up last_auto_vacuum in master, it is a long time before the error occurred.
A lot of data is modified and added due to many API requests at that time, but there is no query that actually takes more than 3 seconds among my queries.
- Why does the error occur? Vacuum has never been done manually and Auto Vacuum has not been run at that time.
- Does the max_standby_streaming_delay setting mean the maximum waiting time after the first delay?
The wal replay is delayed because a query that takes 3 seconds is being executed on the slave. If another query takes 3 seconds just before the end of the first query and this is repeated, the query requested 30 seconds after the delay of the first query is the above error is to get
00:00:00 -> running Query1 in slave (run time 3s)
00:00:01 -> slave wal_receiver received wal for sync (but is delayed because query1)
00:00:02 -> Query2 request in slave (run time 3s)
00:00:03 -> Query1 end and Query3 request in slave (run time 3s)
00:00:04 -> Query4 request in slave (run time 3s)
....
00:00:30 -> Query30 request in slave (run time 3s)
00:00:31 -> A delayed wal replay is executed. (Are Query 29 and Query 30 canceled with the above error message?)