We want to shard our PostgreSQL DB, due to high disk load. Firstly, we looked at django-sharding library, but:
- Very much rewriting in our backend
- Migrating all tables to 64-bit primary keys is hard work on 300-400gb tables
- Generating ids with Postgres Specific algorithm makes it impossible to move data from shard to shard. More than that, we have a large database with old ids. Updating all of them is a big problem too.
- Generating ids with special tables makes us do a special SELECT query to main database every time we insert data. We have high write load, so it's not good.
Considring all these, we decided too look on Postgres database sharding solutions. We found 2 opportunities - Citus and PostgresXL. Citus makes us change data format too much and rewrite a big bunch of backend at the same time, so we are about to try PostgresXL as more transparent solution. But reading the docs, I can't understand some things and will be greatfull for recomendations:
- Are there any other sharding workarounds except for Citus and PostgresXL? It would be good not to change much in our database on migrating.
- Some questions about PostgresXL:
- Do I understand correctly, that it's not Postgres extension, it's a standalone fork? So I should build all its parts from sources and than move data in some way?
- How are Postgres and PostgresXL versions compatible? We have PostgreSQL 9.4. I don't see such a version in PostgresXL (9.2 or 9.5 no middle?). So can I use, for example, streaming replication for migration?
- If yes/no, what is the best solution to migrate data? If I have 2Tb database with heavy write, can I migrate it somehow without stopping for a long period of time?
Thanks.