Migrating from Postgresql to Postgres-XL: distributed tables design

Question

I need to scale out our application DB due to the amount of data. It's on PostgreSQL 9.3. So, I've found PostgreSQL-XL and it looks awesome, but I'm having a hard time trying to wrap my head around the limitations for distributed tables. To distribute them by replication (where the whole table is replicated in every datanode) is quite OK, but let's say I have two big related tables that need to be "sharded" along the datanodes:

CREATE TABLE foos
(
  id bigserial NOT NULL,
  project_id integer NOT NULL,
  template_id integer NOT NULL,
  batch_id integer,
  dataset_id integer NOT NULL,
  name text NOT NULL,
  CONSTRAINT pk_foos PRIMARY KEY (id),
  CONSTRAINT fk_foos_batch_id FOREIGN KEY (batch_id)
      REFERENCES batches (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT fk_foos_dataset_id FOREIGN KEY (dataset_id)
      REFERENCES datasets (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT fk_foos_project_id FOREIGN KEY (project_id)
      REFERENCES projects (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT fk_foos_template_id FOREIGN KEY (template_id)
      REFERENCES templates (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT uc_foos UNIQUE (project_id, name)
);

CREATE TABLE foo_childs
(
  id bigserial NOT NULL,
  foo_id bigint NOT NULL,
  template_id integer NOT NULL,
  batch_id integer,
  ffdata hstore,
  CONSTRAINT pk_ff_foos PRIMARY KEY (id),
  CONSTRAINT fk_fffoos_batch_id FOREIGN KEY (batch_id)
      REFERENCES batches (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT fk_fffoos_foo_id FOREIGN KEY (foo_id)
      REFERENCES foos (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT fk_fffoos_template_id FOREIGN KEY (template_id)
      REFERENCES templates (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE 
);

Now Postgres-XL documentation states that:

"(...) in distributed tables, UNIQUE constraints must include the distribution column of the table"
"(...) the distribution column must be included in PRIMARY KEY"
"(...) column with REFERENCES (FK) should be the distribution column. (...) PRIMARY KEY must be the distribution column as well."

Their examples are over simplistic and scarse, so can someone please DDL me the two above tables for postgres-XL using DISTRIBUTE BY HASH()?

Or maybe suggest other ways to scale out?

score 0 · Answer 1 · edited Feb 16 '15 at 22:31

0

CREATE TABLE foos
( ... ) DISTRIBUTE BY HASH(id);

CREATE TABLE foos_child
( ... ) DISTRIBUTE BY HASH(foo_id);

Now any join on foos.id = foos_child.foo_id can be pushed down and done locally.

edited Feb 16 '15 at 22:31

answered Feb 16 '15 at 22:02

LotsOfData

66
1

Thanks for the quick reply but I had tried that before, it was the first thing I did actually, with no luck. I get this error trying to create the first table: `ERROR: Unique index of partitioned table must contain the hash distribution column.` I guess it's because of the unique constraint, so let's say i could get rid of it, then I get this other error: `ERROR: there is no unique constraint matching given keys for referenced table "batches"` and this one is due to the referencing column batch_id which is a foreign key and should be the distribution column in a distributed table, right? – Joe Feb 17 '15 at 15:39

Migrating from Postgresql to Postgres-XL: distributed tables design

1 Answers1