The righteous path
You might want to reconsider normalizing your schema. It is not necessary for everyone to "join for even the simplest query". Create a VIEW
for that.
Table could look like this:
CREATE TABLE hostname (
hostname_id serial PRIMARY KEY
, host_id int REFERENCES host(host_id) ON UPDATE CASCADE ON DELETE CASCADE
, hostname text UNIQUE
);
The surrogate primary key hostname_id
is optional. I prefer to have one. In your case hostname
could be the primary key. But many operations are faster with a simple, small integer
key. Create a foreign key constraint to link to the table host
.
Create a view like this:
CREATE VIEW v_host AS
SELECT h.*
, array_agg(hn.hostname) AS hostnames
-- , string_agg(hn.hostname, ', ') AS hostnames -- text instead of array
FROM host h
JOIN hostname hn USING (host_id)
GROUP BY h.host_id; -- works in v9.1+
Starting with pg 9.1, the primary key in the GROUP BY
covers all columns of that table in the SELECT
list. The release notes for version 9.1:
Allow non-GROUP BY
columns in the query target list when the primary
key is specified in the GROUP BY
clause
Queries can use the view like a table. Searching for a hostname will be much faster this way:
SELECT *
FROM host h
JOIN hostname hn USING (host_id)
WHERE hn.hostname = 'foobar';
Provided you have an index on host(host_id)
, which should be the case as it should be the primary key. Plus, the UNIQUE
constraint on hostname(hostname)
implements the other needed index automatically.
In Postgres 9.2+ a multicolumn index would be even better if you can get an index-only scan out of it:
CREATE INDEX hn_multi_idx ON hostname (hostname, host_id);
Starting with Postgres 9.3, you could use a MATERIALIZED VIEW
, circumstances permitting. Especially if you read much more often than you write to the table.
The dark side (what you actually asked)
If I can't convince you of the righteous path, here is some assistance for the dark side:
Here is a demo how to enforce uniqueness of hostnames. I use a table hostname
to collect hostnames and a trigger on the table host
to keep it up to date. Unique violations raise an exception and abort the operation.
CREATE TABLE host(hostnames text[]);
CREATE TABLE hostname(hostname text PRIMARY KEY); -- pk enforces uniqueness
Trigger function:
CREATE OR REPLACE FUNCTION trg_host_insupdelbef()
RETURNS trigger
LANGUAGE plpgsql AS
$func$
BEGIN
-- split UPDATE into DELETE & INSERT
IF TG_OP = 'UPDATE' THEN
IF OLD.hostnames IS DISTINCT FROM NEW.hostnames THEN -- keep going
ELSE
RETURN NEW; -- exit, nothing to do
END IF;
END IF;
IF TG_OP IN ('DELETE', 'UPDATE') THEN
DELETE FROM hostname h
USING unnest(OLD.hostnames) d(x)
WHERE h.hostname = d.x;
IF TG_OP = 'DELETE' THEN RETURN OLD; -- exit, we are done
END IF;
END IF;
-- control only reaches here for INSERT or UPDATE (with actual changes)
INSERT INTO hostname(hostname)
SELECT h
FROM unnest(NEW.hostnames) h;
RETURN NEW;
END
$func$;
Trigger:
CREATE TRIGGER host_insupdelbef
BEFORE INSERT OR DELETE OR UPDATE OF hostnames ON host
FOR EACH ROW EXECUTE FUNCTION trg_host_insupdelbef();
SQL Fiddle with test run.
Use a GIN index on the array column host.hostnames
and array operators to work with it: