Normalization to relational form. Representing nested array in separate table

Question

I need to migrate non-relational database to relational. Non-relational database contains M tables Oi with data. Some of data tables Oi have field (cell) that contains array of primitive elements. Arrays can be of different numeric types and dimensions but all of them can be stored as array of doubles (4-byte int and float can be converted to double reversible). Relational DB will contain these M data tables plus additional tables with schema and one table N that will contain array data for all M tables (but it should has separate row for each array element). In other words, to represent arrays in relational DB I'm going to introduce new table N for array data that can be associated with original cell in table Oi. There will be one table N that will contain array data for all tables O1, O2, ... Oi,...OM. For full picture I mentioned about schema tables but it seems they do not matter.

There are two options:

First option:

Table Oi (1<= i <= M) (with name APK) and columns:

PKO (counter of each row - original non-relational table has no PK)
other columns.

Table N with columns:

(part of PK of N) APK (name of table Oi)
(part of PK of N) PKO (foreign key to table O)
(part of PK of N) index in array
j-th array value

Second option:

Table Oi (1<= i <= M) with columns:

PKN1 (foreign key to table N)
other columns.

Table N with columns:

(part of PK of N) PKN1 (counter - unique for each array and equal for all elements in particular array)
(part of PK of N) index in array
j-th array value

Notice: APK is a string around 100 chars and first option repeats APK too much. Maybe it's worth to add an alias - integer ID to each string of APK... The semantics of arrays is unknown - the time restriction is very tough. The number and kinds of tables is big. This database is pattern - there will be several dozens of such databases. So the main goal is to accelerate development(One table N is fast and simple solution). Other goal is be able to use SQL to select each array element without imperative programming. This tables store data for testing another data, so QA should easily access data. Which option is better - or are there any suggestions?

Do the arrays of doubles in table Ox have the same *meaning* as the arrays in table Oy? They appear to be structurally the same, but that doesn't mean that they should necessarily be modelled together. Also, I would try to avoid putting a database table name as a column name (it is sometimes unavoidable). It often means that you can't easily manipulate the data e.g. via joins - you end up having big switch statements etc. which are unwieldy and need to be edited when you want to add a new data table. It's not a definite no, but it comes with a big long-term cost. — Bob Salmon, May 03 '16 at 06:45
@Bob Salmon updated within question. In short - different meaning. Thank you. — Vlad, May 03 '16 at 08:07

score 0 · Answer 1 · answered May 03 '16 at 07:30

A couple points.

First, arrays don't necessarily break 1NF as long as what you are modelling actually is a good fit for a mathematical matrix. In other words, position matters, the array has fixed size on all dimensions etc. So you haven't provided enough information to say whether breaking this off is necessary for normalization or not.

For example, if I represent an IP address as '127.0.0.1' or as ARRAY[127,0,0,1], neither approach breaks 1NF. So this is one reason to start with your data model and not rule out arrays a priori.

Secondly you need to make sure that the functional relationships within the tables are constant. Usually semantics is a proxy here, but it is often better to look at the actual functional dependencies before going further. What does a row represent (the first question is semantic)? What are the functional dependencies within the row?

Those questions will guide your design more than any answer you can get at your current level of detail.

Thank you. Updated. In many places - sorry. in short - arrays in cell will prevent accessing via SQL. The dimensions of arrays are different, so, one multidimensional matrix is not possible. deadline and quantity of tables prevent of deep investigation of semantics. This tables for regression tests - they should be observable and comparable with original data. — Vlad, May 03 '16 at 08:12

score 0 · Answer 2 · edited May 23 '17 at 11:59

In the light of your response to my comment, I would identify all the different kinds of thing that the arrays could represent (both in terms of meaning and, as @Chris Travers says, relationship to other things). Then have a table per kind of thing, and a row in it per member of one of the nested arrays.

Then do a foreign key to the other tables (the new ones that hold the data that wasn't originally nested arrays).

So, if the nested arrays could be ids for Dogs and Ships (for example) and the rest of the data is in new tables O1 to O22 (hopefully you'd have better names):

Dog

DogId
... other dog data
WasNestedArrayId - FK to WasNestedArray

Ship

ShipId
... other ship data
WasNestedArrayId - FK to WasNestedArray

O1

Id
... other data
WasNestedArrayId - FK to WasNestedArray

and so on

O22

Id
... other data
WasNestedArrayId - FK to WasNestedArray

WasNestedArray (please come up with a better name!)

Id
... other data about the array

So when you come across a new nested array you:

Create a new record in WasNestedArray
Determine which kind of array it was (Ships or Dogs) and insert the members of the array as separate rows in that table, each labelled with the relevant WasNestedArrayId
Create a new record in Ox for the rest of the data that was alongside the nested array, and refer to the same WasNestedArrayId.

It's an instance of polymorphic association, made a bit more complicated by the fact that you've got many source tables (O1 to ON) rather than a single one.

All of this assumes that there are no duplicate e.g. ships in the various nested arrays. You could either be a bit hacky and have duplicates in the Ship etc. tables (via different ids) or put a layer of WasNestedArrayHasShip, WasNestedArrayHasDog etc. tables to resolve the many to many relationship.

Normalization to relational form. Representing nested array in separate table

2 Answers2