The PostgreSQL service on the standby node of my PostgreSQL cluster starts and stops again and again after I started the Patroni service.
I want to build a PostgreSQL HA cluster on two computers using Patroni 1.6.4 and etcd 3.3. Firstly, I built an etcd cluster, which is healthy.
member 230e646882873b50 is healthy: got healthy result from http://10.19.170.119:2379
member afcefe35d67a646c is healthy: got healthy result from http://10.19.170.155:2379
cluster is healthy
Next, I built a stream replication PostgreSQL cluster on two computers (running on port 5433), which worked fine.
Then I stopped then PostgreSQL cluster and started Patroni on the primary server and the standby server.
The PostgreSQL and Patroni service on the primary server seemed working normally.
2020-04-02 18:17:22,402 INFO: Lock owner: pgsql_119; I am pgsql_119
2020-04-02 18:17:22,430 INFO: no action. i am the leader with the lock
2020-04-02 18:17:26,402 INFO: Lock owner: pgsql_119; I am pgsql_119
2020-04-02 18:17:26,430 INFO: no action. i am the leader with the lock
But a problem occurred on the standby server. Patroni on the standby server printed messages as follows:
2020-04-02 18:45:25,995 INFO: no action. i am a secondary and i am following aleader
2020-04-02 18:45:27,285 INFO: closed patroni connection to the postgresql cluster
2020-04-02 18:45:27,722 INFO: postmaster pid=7448
2020-04-02 18:45:27.832 HKT [7448] LOG: listening on IPv4 address "0.0.0.0", port 5433
2020-04-02 18:45:27.994 HKT [7448] LOG: redirecting log output to logging collector process
2020-04-02 18:45:27.994 HKT [7448] HINT: Future log output will appear in directory "log".
2020-04-02 18:45:30,058 INFO: Lock owner: pgsql_node119; I am pgsql_node155
2020-04-02 18:45:30,058 INFO: does not have lock
2020-04-02 18:45:30,058 INFO: establishing a new patroni connection to the postgres cluster
2020-04-02 18:45:31,162 INFO: no action. i am a secondary and i am following aleader
2020-04-02 18:45:32,460 INFO: closed patroni connection to the postgresql cluster
2020-04-02 18:45:32,875 INFO: postmaster pid=8820
2020-04-02 18:45:32.996 HKT [8820] LOG: listening on IPv4 address "0.0.0.0", port 5433
2020-04-02 18:45:33.161 HKT [8820] LOG: redirecting log output to logging collector process
2020-04-02 18:45:33.161 HKT [8820] HINT: Future log output will appear in directory "log".
2020-04-02 18:45:35,211 INFO: Lock owner: pgsql_node119; I am pgsql_node155
2020-04-02 18:45:35,211 INFO: does not have lock
2020-04-02 18:45:35,211 INFO: establishing a new patroni connection to the postgres cluster
2020-04-02 18:45:37,215 INFO: establishing a new patroni connection to the postgres cluster
The content of postgresql logs repeated as follows:
FATAL: the database system is starting up
LOG: redo starts at 0/3A000060
LOG: consistent recovery state reached at 0/3A000140
LOG: invalid record length at 0/3A000140: wanted 24, got 0
LOG: database system is ready to accept read only connections
LOG: started streaming WAL from primary at 0/3A000000 on timeline 40
LOG: received fast shutdown request
LOG: aborting any active transactions
FATAL: terminating connection due to administrator command
FATAL: terminating walreceiver process due to administrator command
LOG: shutting down
LOG: database system is shut down
LOG: database system was shut down in recovery at 2020-04-02 18:50:20 CST
That meant PostgreSQL on the standby server restarted every 5 seconds!
Here is one of my patroni.yml. Another one is the same except the ip address.
scope: pgsql
namespace: /pgsql/
name: pgsql_node119
restapi:
listen: 10.19.170.119:8008
connect_address: 10.19.170.119:8008
etcd:
host: 10.19.170.119:2379
bootstrap:
# this section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster
# and all other cluster members will use it as a `global configuration`
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
master_start_timeout: 300
synchronous_mode: false
# check_timeline: true
postgresql:
use_pg_rewind: true
use_slots: true
postgresql:
listen: 0.0.0.0:5433
connect_address: 10.19.170.119:5433
data_dir: "/opt/postgresql-11/data"
bin_dir: "/opt/postgresql-11/bin"
# config_dir: /etc/postgresql/9.6/main
authentication:
replication:
username: repuser
password: repuserpwd
superuser:
username: postgres
password: postgrespwd
#watchdog:
# mode: automatic # Allowed values: off, automatic, required
# device: /dev/watchdog
# safety_margin: 5
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
Do you know why this happened or how to solve this problem? Thank you.