PosgreSQL safely handling query conflicts in replication

Question

Long running queries on a postgres aurora read replica have triggered a query conflict with the ERROR: canceling statement due to conflict with recovery - similar to what is described in this question. I was planning to increase the max_standby_archive_delay and max_standby_archive_delay mentioned in the top solution (they are set to the defaults of 14s and 30s). However, I am concerned about what the risks are of increasing this value - particularly the risk of the replica data becoming stale or out-of-date.

My question is: does this error indicate that the replica is querying data that is 14s or more behind the primary? Will increasing these values risk querying data is that is even more out of date?

When I checked the ReplicaLag metric on AWS, the max during this time was still very low - around 150ms - which is way under the existing default timeouts of 14 and 30s. Is this just the lag to writing to the log but not applying the WAL on the replica? Is there another metric that tells me how "stale" the data is on the replica?

I found this excellent article on replication conflicts and used pg_stat_database_conflicts to determine that the conflicts were all due to "Buffer pin replication conflicts", which are apparently not straightforward to resolve other than with those settings.

score 0 · Answer 1 · answered Jun 21 '23 at 10:36

0

If you are using streaming replication, the parameter is max_standby_streaming_delay. And yes, if you increase that, the data visible on the standby can start lagging more behind the primary.

No, there is no way to reliably avoid that.

answered Jun 21 '23 at 10:36

Laurenz Albe

209,280
17
206
263

PosgreSQL safely handling query conflicts in replication

1 Answers1