I need to scale out our application DB due to the amount of data. It's on PostgreSQL 9.3. So, I've found PostgreSQL-XL and it looks awesome, but I'm having a hard time trying to wrap my head around the limitations for distributed tables. To distribute them by replication (where the whole table is replicated in every datanode) is quite OK, but let's say I have two big related tables that need to be "sharded" along the datanodes:
CREATE TABLE foos
(
id bigserial NOT NULL,
project_id integer NOT NULL,
template_id integer NOT NULL,
batch_id integer,
dataset_id integer NOT NULL,
name text NOT NULL,
CONSTRAINT pk_foos PRIMARY KEY (id),
CONSTRAINT fk_foos_batch_id FOREIGN KEY (batch_id)
REFERENCES batches (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT fk_foos_dataset_id FOREIGN KEY (dataset_id)
REFERENCES datasets (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT fk_foos_project_id FOREIGN KEY (project_id)
REFERENCES projects (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT fk_foos_template_id FOREIGN KEY (template_id)
REFERENCES templates (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT uc_foos UNIQUE (project_id, name)
);
CREATE TABLE foo_childs
(
id bigserial NOT NULL,
foo_id bigint NOT NULL,
template_id integer NOT NULL,
batch_id integer,
ffdata hstore,
CONSTRAINT pk_ff_foos PRIMARY KEY (id),
CONSTRAINT fk_fffoos_batch_id FOREIGN KEY (batch_id)
REFERENCES batches (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT fk_fffoos_foo_id FOREIGN KEY (foo_id)
REFERENCES foos (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT fk_fffoos_template_id FOREIGN KEY (template_id)
REFERENCES templates (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE
);
Now Postgres-XL documentation states that:
- "(...) in distributed tables, UNIQUE constraints must include the distribution column of the table"
- "(...) the distribution column must be included in PRIMARY KEY"
- "(...) column with REFERENCES (FK) should be the distribution column. (...) PRIMARY KEY must be the distribution column as well."
Their examples are over simplistic and scarse, so can someone please DDL me the two above tables for postgres-XL using DISTRIBUTE BY HASH()?
Or maybe suggest other ways to scale out?