How do I normalise this relation (i.e. make it conform to 1NF, 2NF, and 3NF)
CREATE TABLE IF NOT EXISTS series (
series_id SERIAL PRIMARY KEY,
dimension_ids INT[] UNIQUE,
dataset_id INT REFERENCES dataset(dataset_id) ON DELETE CASCADE
);
CREATE TABLE IF NOT EXISTS dimension (
dimension_id SERIAL PRIMARY KEY,
dim VARCHAR(50),
val VARCHAR(50),
dataset_id INT REFERENCES dataset(dataset_id) ON DELETE CASCADE,
UNIQUE (dim, val, dataset_id)
);
Where subsets of dimension_id
's uniquely identify records in the series
table.
EDIT
To provide more information, the data I want to store comes from XML structures looking something like the following
<?xml version="1.0" encoding="utf-8"?>
<message:StructureSpecificData >
<message:Header>
<message:ID>IREF757740</message:ID>
<message:Test>false</message:Test>
<message:Prepared>2020-04-09T14:55:23</message:Prepared>
</message:Header>
<message:DataSet ss:dataScope="DataStructure" ss:structureRef="CPI" xsi:type="ns1:DataSetType">
<Series FREQ="M" GEOG_AREA="WC" UNIT="IDX">
<Obs OBS_STATUS="A" OBS_VALUE="75.5" TIME_PERIOD="31-Jan-2008"/>
<Obs OBS_STATUS="A" OBS_VALUE="75.8" TIME_PERIOD="29-Feb-2008"/>
<Obs OBS_STATUS="A" OBS_VALUE="77" TIME_PERIOD="31-Mar-2008"/>
<Obs OBS_STATUS="A" OBS_VALUE="77.5" TIME_PERIOD="30-Apr-2008"/>
<Obs OBS_STATUS="A" OBS_VALUE="78" TIME_PERIOD="31-May-2008"/>
<Obs OBS_STATUS="A" OBS_VALUE="78.8" TIME_PERIOD="30-Jun-2008"/>
</Series>
<Series FREQ="M" GEOG_AREA="NC" UNIT="IDX">
<Obs OBS_STATUS="A" OBS_VALUE="75.5" TIME_PERIOD="31-Jan-2008"/>
<Obs OBS_STATUS="A" OBS_VALUE="75.8" TIME_PERIOD="29-Feb-2008"/>
<Obs OBS_STATUS="A" OBS_VALUE="77" TIME_PERIOD="31-Mar-2008"/>
<Obs OBS_STATUS="A" OBS_VALUE="77.5" TIME_PERIOD="30-Apr-2008"/>
<Obs OBS_STATUS="A" OBS_VALUE="78" TIME_PERIOD="31-May-2008"/>
<Obs OBS_STATUS="A" OBS_VALUE="78.8" TIME_PERIOD="30-Jun-2008"/>
</Series>
</message:DataSet>
</message:StructureSpecificData>
There is a dataset that contains series
(0...n) that contain observations (0...n). The series are uniquely identified by their XML attributes - what I call dimensions in my data model. In my example I have two series
, differentiated by the geographical areas they cover. Any series
can have an arbitrary number of dimensions. series
are expected to be queried from its dimensions and the dimensions will also be queried using the series_id
. The obvious solution is a bridging table:
CREATE TABLE series_dimension
series_id INT REFERENCES series(series_id) ON DELETE CASCADE,
dimension_id INT REFERENCES dimension(dimension_id)
);
This solution permits, however, the following scenario:
|--------------------------|
| series_dimension |
|-----------|--------------|
| series_id | dimension_id |
|-----------|--------------|
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
|-----------|--------------|
That is, two different series
with the same dimensions, so that if I query a series
for a given set of dimensions I can't decide in the case of dimensions [1 2 3 4]
whether I am looking for series_id
=1 or series_id
=2 which is unacceptable. Is it therefore the case that in such a situation, I must decide between having referential integrity and the uniqueness property I have just explained?