The problem with {country,state,city} is that they appear to be a candidate key for the referenced table. In SQL, {country,state,city} cannot be a candidate key (or even a primary key), if state (or country) can be missing or NULL. (this could be avoided by allowing an empty string for them, which is different from NULL, but that would be an ugly hack, IMO) The same would go for zipcode, which could only be made into a candidate key by adding country
to it. And both could be missing, unknown or NULL.
The only way around the crippled candidate keys would be to demote them to (non-unique) indexes, and add a surrogate primary key, like in:
CREATE TABLE cities
( city_id INTEGER NOT NULL PRIMARY KEY -- could be a serial ...
, country_name varchar -- you _could_ squeeze this out into a separate "countries" table
, state_name varchar -- you could even squeeze this out, but it would need a composite FK
, city_name varchar NOT NULL
);
CREATE TABLE adresses
( person_id INTEGER NOT NULL PRIMARY KEY -- could be a serial
, last_name varchar NOT NULL
, first_first_name varchar
, gender CHAR(1)
, dob DATE
, city_id INTEGER references cities(city_id) -- could be NOT NULL
);
WRT {city,state}
: you could squeeze these out into a junction table (this basically is a BCNF problem, maybe even a 4NF problem, if all the join fields were non-NULLABLE) like in:
--
-- Plan B:
--
CREATE TABLE country2
( country_id INTEGER NOT NULL PRIMARY KEY -- could be a serial ...
, country_name varchar NOT NULL
, country_iso varchar
-- ...
, UNIQUE (country_name)
);
CREATE TABLE country_state2
( cs_id INTEGER NOT NULL PRIMARY KEY -- could be a serial ...
, country_id INTEGER NOT NULL REFERENCES country2(country_id)
, state_name varchar
);
CREATE TABLE cities2
( city_id INTEGER NOT NULL PRIMARY KEY -- could be a serial ...
, cs_id INTEGER REFERENCES country_state2(cs_id)
, city_name varchar NOT NULL
);
CREATE TABLE adresses2
( person_id INTEGER NOT NULL PRIMARY KEY -- could be a serial
, last_name varchar NOT NULL
, first_first_name varchar
, gender CHAR(1)
, dob DATE
, city_id INTEGER references cities2(city_id) -- could be NOT NULL
);
Whether you should actually do this is a matter of taste (see @Joel Brown's answer). The normallisation would certainly help in case of massive renaming operations, like the merging of municipalities in the OQ. For small sets of addresses (upto maybe a few thousand), the extra complexity would probably cost more than it would gain.
This complexity is particularly costly for the front-end applications used to maintain the data. For the DBMS, a few joins would not cost that much (for small sizes) and could even help performance (for larger sizes). Normalisation is not bad for performance.
UPDATE (after Mike Sherill catcall's comment):
If we could impose NOT NULL
constraints on {country,state,city} (or there ids),
we could also impose UNIQUE constraints on the (composite) candidate keys that they are part of:
--
-- Plan C:
--
CREATE TABLE country3
( country_id INTEGER NOT NULL PRIMARY KEY -- could be a serial ...
, country_name varchar NOT NULL
, country_iso varchar
, UNIQUE (country_name)
);
CREATE TABLE country_state3
( cs_id INTEGER NOT NULL PRIMARY KEY -- could be a serial ...
, country_id INTEGER NOT NULL REFERENCES country3(country_id)
, state_name varchar NOT NULL
, UNIQUE (country_id,state_name)
);
CREATE TABLE cities3
( city_id INTEGER NOT NULL PRIMARY KEY -- could be a serial ...
, cs_id INTEGER NOT NULL REFERENCES country_state3(cs_id)
, city_name varchar NOT NULL
, UNIQUE (cs_id,city_name)
);
CREATE TABLE adresses3
( person_id INTEGER NOT NULL PRIMARY KEY -- could be a serial
, last_name varchar NOT NULL
, first_first_name varchar
, gender CHAR(1)
, dob DATE
-- allowing NULL here allows for 'embryonic' records without city/state/country info.
, city_id INTEGER references cities3(city_id)
);
Though this NOT NULL
constraint will avoid duplicates in {city,state,country}, it will also impose them to being NOT NULL, obviously. This might be impossible or invalid in other countries (than Canada or the US). In the Netherlands, we don't have state
or county
; we do have provincie
, which is hardly used (only to disambiguate, if needed) Similar for the French departements
, IIRC.