3

We want to shard our PostgreSQL DB, due to high disk load. Firstly, we looked at django-sharding library, but:

  1. Very much rewriting in our backend
  2. Migrating all tables to 64-bit primary keys is hard work on 300-400gb tables
  3. Generating ids with Postgres Specific algorithm makes it impossible to move data from shard to shard. More than that, we have a large database with old ids. Updating all of them is a big problem too.
  4. Generating ids with special tables makes us do a special SELECT query to main database every time we insert data. We have high write load, so it's not good.

Considring all these, we decided too look on Postgres database sharding solutions. We found 2 opportunities - Citus and PostgresXL. Citus makes us change data format too much and rewrite a big bunch of backend at the same time, so we are about to try PostgresXL as more transparent solution. But reading the docs, I can't understand some things and will be greatfull for recomendations:

  1. Are there any other sharding workarounds except for Citus and PostgresXL? It would be good not to change much in our database on migrating.
  2. Some questions about PostgresXL:
    • Do I understand correctly, that it's not Postgres extension, it's a standalone fork? So I should build all its parts from sources and than move data in some way?
    • How are Postgres and PostgresXL versions compatible? We have PostgreSQL 9.4. I don't see such a version in PostgresXL (9.2 or 9.5 no middle?). So can I use, for example, streaming replication for migration?
    • If yes/no, what is the best solution to migrate data? If I have 2Tb database with heavy write, can I migrate it somehow without stopping for a long period of time?

Thanks.

M1ha Shvn
  • 103
  • 5

1 Answers1

0

First off to save your self a LOT of headache have you looked at options Like Amazon's Auora, Dynomo, Red Shift, etc services? They are VERY cost effective at scale, as well as optimized and managed for you.

Actually Amazon's straight Postgress databases can handle MASSIVE amounts of reads or writes. We can go into 2,000- 6,000 IOPS on reads and another 2,000 to 6,000 IOPS in writes without issue. I would really look into this as the option. Azure, Oracle, and Google also have competing services.

Also be aware that Postgres-XL beyond all reason has no HA support. If you lose a single node you lose everything. The nodes can not fail over.

it's a standalone fork?

Yes, They are very different apps and developed separate from each other.

How are Postgres and PostgresXL versions compatible?

They arn't compatible. You can not just migration Postgres to Postgresl-XL. They work VERY differently.

Generating ids with Postgres Specific algorithm makes it impossible to >move data from shard to shard

Not following this, but with sharing you are not supposed to move data from one shard to another. The key being used generally needs to be something specific and unique to split/segregate your data on. Like a date, or a "type" field, or some other (hopefully ordered) field(s)/column(s). This breaks things up but has obvious pain in the a$$ limitations.

Are there any other sharding workarounds except for Citus and PostgresXL? It would be good not to change much in our database on >>migrating.

Tons of options, but right off the bat going from a standard RDS, to a NoSql, or MPP database is going to be a major migration, a lot of effort, and have a LOT of limitations no matter what you do.

Next Postress-XL and Citus are MPP (massive parallel processing) clustering apps, not sharing specifically. That is part of what they can do, but it is not their focus.

Other options for MPP

pgPool -- (not great for heavy writes )

haProxy -- ( have not done it but read about it. Lost of work to setup and maintain. )

MySql Cluster -- (Huge pain to use the OSS version and major $$$ for the commercial version)

Green Plumb

Teradata

Vertica

what is the best solution to migrate data?

Very unlikely to find a simple migration for this kind of switch. You can expect to likely need to export the data your self from the existing RDS and import it to the new DB and will likely have to write something your self to get it the way you want it.

BrianC
  • 1,793
  • 1
  • 18
  • 26