2

After installing and setting up a 2 node cluster of postgres-xl 9.2, where coordinator and GTM are running on node1 and the Datanode is set up on node2.

Now before I use it in production I have to deliver a DRP solution. Does anyone have a DR plan for postgres-xl 9.2 architechture?

Best Regards, Aviel B.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
user3796774
  • 101
  • 5
  • 1
    A DR plan isn't something you get out of a can. It's specific to your needs, your app, your downtime/recovery windows, your cost trade-offs. – Craig Ringer Jul 02 '14 at 13:22
  • Asuming we're talking about the standard requirements of same hardware on each site and downtime is few minutes, app's switchover/failover takes is fast and the application is ready for a minor connection loss, what are my options? By the way storage based solution is NOT an option.. – user3796774 Jul 02 '14 at 20:27
  • I was going to propose storage replication/backup at intervals... I think any non filesystem-based solution would take much more time than a few minutes for a large database. For people to answer, what tools do you have packed with postgres-xl and what would you *not* do to prepare for a disaster ? – Steve K Jul 29 '14 at 20:08

1 Answers1

0

So from what you described you only have one of each node... What are you expecting to recover too??

Postgres-XL is a clustered solution. If you only have one of each node then you have no cluster and not only are you not getting any scaling advantage it is actually going to run slower than stand alone Postgres. Plus you have nothing to recover to. If you lose either node you have completely lost the database.

Also the docs recommend you put the coordinator and data nodes on the same server if you are going to combine nodes.

So for the simplest solution in Replication mode you would need something like

  • Server1 GTM
  • Server2 GTM Proxy
  • Server3 Coordinator 1 & DataNode 1
  • Server4 Coordinator 2 & DataNode 2

Postgres-XL has no fail over support so any failure will require manual intervention.

If you use the replication DISTRIBUTED BY option you would just remove the failing node from the cluster and restart everything.

If you used another DISTRIBUTED BY options then data is shared over multiple nodes which means if you lose any node you lose everything. So for this option you will need to have a slave instance of every data node and coordinator node you have. If one of the nodes fails then you would remove that node from the cluster and replace it with its slave backup node. Then restart it all.

BrianC
  • 1,793
  • 1
  • 18
  • 26