0

Which table design is prefered when considering database normalization rules:

Consider the following designs for a node table that describes an ordered tree. The key is a composite key.

  • ck: composite key
  • parent: parent node
  • next: the next node, where the next.parent = parent.
    (this defines a forward linked list)
  • sk: surrogate key

design 1:

node(ck_x, ck_y, parent_ck_x, parent_ck_y, next_ck_x, next_ck_y)

design 2:

node(sk, ck_x, ck_y, parent_ck_x, parent_ck_y, next_ck_x, next_ck_y)

design 3:

node(sk, ck_x, ck_y, parent_sk, next_sk)

The first design has 6 columns. The second design adds a surragete key and has 7 columsn. The third design has 5 colums it uses the surrogate to save a column.

Are there any normalization rules (or other database design rules) that prefer one design over the others?

update

Alternative designs: subtype the node table, isParent flag, nested sets. These designs have a larger read/write complexity.

design 4:

  • This design splits the table into 3 tables. The parent and next table contain a mutually exclusive subset of the keys from the node table. It uses 2+4=6 columns for each node.

    node(ck_x, ck_y)

    parent(ck_x, ck_y, parent_ck_x, parent_ck_y)

    next(ck_x, ck_y, next_ck_x, next_ck_y)

design 5:

  • This design uses a isParent flag to indicate that the next item is the parent. It uses 4+1=5 columns, and 1 column is only a bit. Which is less space than 5 columns as used in design 3)

    node(ck_x, ck_y, next_ck_x, next_ck_y, isParent)

design 6:

  • This design uses a nested set to create an ordered tree. The composite key is no longer used to define the parents or the order of the children. It uses 2+2=4 columns. But the lower and upper bound column should be both using the sizeof(ck_x)+sizeof(ck_y) which is equal the space of 6 columns used in design 1.

    node(ck_x, ck_y, lowerBound, upperBound)

update

design 7:

  • this uses a index for the position of the node.

    node(ck_x, ck_y, parent_ck_x, parent_ck_y, index)

Notes

  • Using the previous node i.s.o. the next node reduces creation and addition to a single insert compared to insert and update.

  • Normalization is not about the number of columns or tables.

Tom Fuller
  • 5,291
  • 7
  • 33
  • 42
Wouter
  • 2,540
  • 19
  • 31
  • Please give for each table a statement parameterized by column names that a row in it says about the tree. – philipxy Jan 02 '16 at 08:28
  • 2
    Generally you do *not* want to embed a linked list into a relational design. The preferred course for storing order is to assign an ordering-key whose value indicates the order of siblings within a group. While this does require re-assigning keys to all siblings within a group, this is much preferable given how relational queries actually work. – RBarryYoung Jan 03 '16 at 19:27
  • Thanks. The thing is when using next the parent information is kind of duplicated because we already know the parent for the first sibling. Using ordinal position this duplication is removed as there is no way to find the siblings other than via the parent. My concern is that using a next pointer this is somehow not conform a normalized form (guidline). Using an ordinal solves this but it doesn't answer my concern. Given a database using a next pointer? Is it conform 1 2 3 4 5 6 NF? Is there any design guidline that is violated using next pointers. Design 6 is kind of similar. – Wouter Jan 03 '16 at 22:03

1 Answers1

1

TL;DR Guessing from what you have given us: From the likely functional dependencies (which determine candidate keys) from how you appear to be representing trees, all of these designs are in 5NF. No design here needs to be changed for reasons of normalization.


Normalization replaces a table by other tables without introducing new columns. Normalization to BCNF requires knowing all functional dependencies; normalization beyond that requires knowing about join dependencies. That requires knowing what a row states about the situation by being in a table and what situations can arise. (If we know that some columns are unique then we know that they functionally determine all columns.)

First pick designs by which you can describe any situation that can arise and by which you can express queries about those situations. Normalization might then improve a design (in terms of reducing update anomalies or certain redundancies), but other aspects of the appropriateness or quality of your design are independent of normalization. Tree-structured relation designs must be chosen with care to suit intended usage and DBMS characteristics.

PS 1 Normalization does not introduce surrogates. Functional dependencies matter; you haven't given them. Candidate keys matter; you haven't given them. (Primary keys only matter in that they are candidate keys.) Read about the basic notions and steps of normalization.

PS 2 The constraint on linked-list siblings sharing the same parent is not addressed by normalization. Just because there are repeated values there needn't be redundancy; redundancy is about repeated statements made by rows being in or not in tables. Re "redundancy". In fact the obvious basic relational way to describe a tree is just a table "parent P has child C" or "parent P has Nth child C". You are relationally representing a tree. Don't relationally represent a (non-relational) representation of a tree.

PS 3 Your concern with the number and size/space of columns (including use of surrogate keys) is almost certainly misdirected. Just make the most straightforward design given your queries and updates. (Which, again, you haven't described.)

Community
  • 1
  • 1
philipxy
  • 14,867
  • 6
  • 39
  • 83
  • So normalization does not require to introduce a surrogate key in order to remove a column because there is no functional dependency on the primary key? But we do know there is some kind of dependency because the next item is limited to those only where the parent is the same. Maybe parent(ck_x, ck_y, parent_ck_x, parent_ck_y), next(ck_x, ck_y, next_ck_x, next_ck_y) defines a higher normalized form? – Wouter Jan 03 '16 at 13:38
  • @Wouter: normalization is most definitely **not** about introducing surrogate keys. –  Jan 03 '16 at 16:27
  • Thanks PS2 answers my question. Though it still looks like redundancy. It is not relevant for normalization. – Wouter Jan 03 '16 at 22:14
  • @Wouter I added a link in my answer to answers about repeated values vs redundancy. – philipxy Jan 05 '16 at 06:23
  • Thanks. The redundancy I was considering is the information about sibling having the same parent. In a way you already know the parent if you know it is a sibling. Normalization (2,3 and BCNF) also deal with redundancy. My designs 4 and 5 store it only once. But finding the parent for any node then requires recursive queries. Design 6 doesn't need recursive queries and uses the bounds to store the recursion. (But update queries will also become complexer.) – Wouter Jan 05 '16 at 23:02