0

I have run into a number of cases where the postgresql connection is in a bad state. OK, I'll make this complicated by having a master with a couple of slave servers... And to complicate matters even more, the master is on OS X with postgresql installed via homebrew

So things like

pg_basebackup: could not connect to server: could not connect to server: Connection refused
    Is the server running on host "XX.XX.XX.XX" and accepting
    TCP/IP connections on port 5432?

crop up. The master postgres/server.log was showing

FATAL: pre-existing shared memory block (key xx, ID yyy) is still in use.

this notwithstanding running

launchctl unload ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist
launchctl load ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist

to start and stop the service. [head scratch] Got we worrying... So I run

launchctl unload ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist

I query the master's application and can edit data! WAIT postgresql was supposed to be unloaded!

So launctl is NOT working as expected... This explains why some synching up has fallen by the wayside.

Question 1: How can we ensure that postgres is really stopped on OS X because although it may be risky, the only alternative appears to be to kill the postmaster.pid

Then things went probably haywire in the flow... master's log states:

LOG: could not send data to client: Broken pipe

and effectively one of the VPS is complaining

$ service postgresql stop
Error: pid file is invalid, please manually kill the stale server process.

Question 2: how can those be killed (Ubuntu 14.04) without harming the database and the WAL process that should be updating the slave? (or is there a more effective/saner way of handling master-slave replication?)

Community
  • 1
  • 1
Jerome
  • 5,583
  • 3
  • 33
  • 76
  • Please see http://stackoverflow.com/help/on-topic – Craig Ringer Oct 06 '14 at 15:04
  • **Do not delete `postmaster.pid`**. That's almost certainly not going to be the wrong solution and very likely to risk severe database corruption. (It sounds to me like you might be trying to have multiple PostgreSQL instances share a single data directory, either on purpose or by accident. Very bad idea. Do not do that.) – Craig Ringer Oct 06 '14 at 15:05
  • Q1: by looking at the processes. Q2: by looking at the processes, and `kill -TERM` the main postgres process. – Daniel Vérité Oct 06 '14 at 15:06
  • 1
    BTW, if you have an Ubuntu instance replicating an OSX master, that's broken. Master and slave must have the same OS and architecture. – Daniel Vérité Oct 06 '14 at 15:07
  • @DanielVérité and (as you know, but just making sure it's clear in the history here) **never** by `kill -9` or `kill -KILL` the process. As someone I was helping recover from the aftermath discovered extremely expensively recently when they sigkill'd the postmaster and deleted `postmaster.pid` while a backend was running a `vacuum full`... – Craig Ringer Oct 06 '14 at 15:08

0 Answers0