I have a set of Item
-s and each of them has a set of Tag
-s. I want to have a DB SELECT which, for some (large) subset of these items, returns the total count of each Tag
in that Item
sub-set.
Is there a way to do this using the PostgreSQL 9.3 / 9.4 array operators?
My plan B is to have a separate table Tags
and many-to-many link table Item_Tags
, and then do a:
CREATE TABLE "Tags" (
"Name" character varying,
id uuid NOT NULL,
CONSTRAINT id PRIMARY KEY (id)
);
CREATE TABLE "Items" (
id uuid NOT NULL,
data character varying,
CONSTRAINT "Items_pkey" PRIMARY KEY (id)
);
CREATE TABLE "Item_Tags" (
tag_id uuid,
item_id uuid,
id uuid NOT NULL,
CONSTRAINT "Item_Tags_pkey" PRIMARY KEY (id),
CONSTRAINT "Item_Tags_item_id_fkey" FOREIGN KEY (item_id)
REFERENCES "Items" (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT "Item_Tags_tag_id_fkey" FOREIGN KEY (tag_id)
REFERENCES "Tags" (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
);
Select "Tags"."Name", count(*)
From "Tags"
Join "Item_Tags" on "Tags"."id" = "Item_Tags"."tag_id"
Join "Items" on "Items"."id" = "Item_Tags"."item_id"
Where "Items"."data" in ('a', 'b', 'c', 'd', 'e') -- replace with actual criteria
Group By "Tags"."Name"
Is there a better way?
Are there any special indices to use which would help make this more efficient, assuming both the Items
and Tags
tables are large (hundreds of millions and millions items, respectively)?
If I want counts of all tags (without filtering), should I just create a view and use that?