Database Normalization chaining

Question

I have multiple table chaining like so:

Table1

product_id SERIAL NOT NULL,
name varchar,

Table2 ( kept separate from table1 because same product name but can be different color )

table2_id
product_id integer,
color varchar,
FOREIGN KEY (product_id) REFERENCES table1 (product_id) ON DELETE CASCADE

table3 ( kept separate from table2 because same product color but can be different size)

table3_id
table2_id integer,
size varchar,
FOREIGN KEY (table2_id) REFERENCES table2 (table2_id) ON DELETE CASCADE

e.g product data could exist in such a manner:

a chair (name)  -  red (color)  - 100cm(size)
a chair (name)  -  red (color)  - 200cm(size)
b chair (name)  -  green (color)  - 100cm(size)
b chair (name)  -  green (color)  - 200cm(size)
c chair (name)  -  black (color)  - s(size)
c chair (name)  -  black (color)  - m(size)
d chair (name)  -  black (color)  - null(size)
e chair (name)  -  gold (color)  - big(size)
e chair (name)  -  gold (color)  - small(size)

To normalize the tables (ie,to remove the duplicates), I separated them as 3 tables but I'm not sure whether chaining like this correct or not.

What you are doing is *not* normalizing. Normalization does not introduce new columns (eg ids). You are also not giving the information needed for normalization (FDs and JDs). And if product_ids are 1:1 with names in table1 then the table of example data already *is* in 5NF. So you have some misconceptions. I already explained this [here](http://stackoverflow.com/questions/31841269/postgresql-database-design-for-ecommerce) (just edited). — philipxy, Aug 16 '15 at 13:42
1. Your "because"s are not clear and you don't explain the connection between each "because" and the existence of the corresponding table. 2. *Please* say a) what a row in your "product data" "table" states in terms of product_id, name, color and size and b) all cases of when a column's value must always appear with the same subrow value for a set of colums that doesn't include it. — philipxy, Aug 16 '15 at 14:27

score 0 · Answer 1 · answered Aug 07 '15 at 06:51

0

Yes that's fine, but arguably the model ought to be something like:

product -< product_variant >- product_size

                  V
                  |
           product_colour

answered Aug 07 '15 at 06:51

David Aldridge

51,479
8
68
96

Thanks for reply, do you mean my data like your example? I'm not sure I understand what you mean? I think its more like `product -> product_color -> product_size ----> product_price` and `product -> product_color -> product_size ----> product_qty` seems I have to chaining from `product -> product_color -> product_size` – user1775888 Aug 07 '15 at 10:29
The product variant table can be a child of the Product, Product Size, and Product Colour tables, and contain the price. – David Aldridge Aug 07 '15 at 12:47
1

`@DavidAldridge (& @user1775888): I don't think user1775888 is conceiving of product variants, I think he thinks he has to have an id every time a given subtuple of value appears more than once. Ie I think the data he is actually concerned with is the example data & that it's already in 5nf. – philipxy Aug 16 '15 at 14:31

score 0 · Answer 2 · edited May 23 '17 at 12:23

It seems like maybe you think that you need an id every time a given subtuple of value appears more than once in a table. Ie that you are trying to remove "duplicates" that do not need to be removed. The same value appearing more than once is not a "duplicate" in the normalization sense and is not necessarily bad, either. Also neither your use of multiple tables nor of ids is normalization. (I have already tried to address this table definition "chaining" anti-pattern here.)

From this answer:

The problem is that a bus stops several times at the same stop in different hours, how could I store this information avoiding redundancy?

There is no such problem. A subrow can appear more than once in a table whether or not there is "redundancy". (Whatever you think that means. Although one can replace subrows that appear multiple times by ids plus another table, then one needs more joins for the same query result. See this answer.)

Or from that answer's link:

This has nothing to do with normalization. Normalization never creates new column names. 'I don't want "Nintendo" to be duplicated' is misconceived. There is nothing wrong per se with values appearing in multiple places. See the answers by sqlvogel & myself here.

"Redundancy" is not about values appearing in multiple places. It is about multiple rows stating the same thing about the application. When using a design like that there are two basic problems: to say certain things multiple rows are involved (while the normalized version involves just one row); and there is no way to say just one of the things at a time (which normalization can help with). If you make two different independent statements about Nintendo then you need two tables and Nintendo mentioned in each one. Re rows making statements about the application see this. (And search my other answers re a table's "statement" or criterion".) Normalization helps because it replaces tables whose rows state things of the form "... AND ..." by other tables that state the "..." separately. See this and this. (Normalization is commonly erroneously thought to involve or include avoiding multiple similar columns, avoiding columns whose values have repetitive structure and/or replacing strings by ids, but although these can be good design ideas they're not normalization.)

From my answer at that answer's first link:

If SUBJECT_MODULE is rows where "[SUBJECT_NAME] has [MODULE_NAME] identified by [MODULE_ID]" and a subject might have more than one module then somewhere you must have multiple mentions of that subject (perhaps via its name) with mentions of different modules (perhaps by name or id). That would not involve redundancy.

score -1 · Answer 3 · edited May 23 '17 at 11:59

-1

This is the second version of the same question you have submitted. The answer is the same.

A product is an entity. Size and color are attributes.

A table contains entities. Each entity is represented by a row. The fields of each row are the attributes of each entity. You do not create a new table every time you need to add an attribute to an entity. That would make relational databases impossible to work with.

edited May 23 '17 at 11:59

Community

1
1

answered Aug 07 '15 at 16:46

TommCatt

5,498
1
13
20

Entities are not represented by rows, they're represented by values. Attributes are binary relations which are represented by pairs of values, not by single columns. Tables/rows represent one or more relations between entities. See Chen's paper `The Entity-Relationship Model - Toward a Unified View of Data`. – reaanb Aug 08 '15 at 05:06
I admit to not being familiar with Chen and you don't provide a link to do so. I can only refer to the entirety of relational database writings -- pretty much Date, et al. Your description of rows, entities and relations make sense only in the very narrow context of intersection tables where the row indeed represents a relationship. However, intersection tables were not the subject here. – TommCatt Aug 08 '15 at 19:43
Ok, I found the reference. It turns out that what you described is a tuple from a relationship set (page 12). That is far from the topic of this conversation. – TommCatt Aug 08 '15 at 20:46
See sections 2.3.1 on pages 6&8 regarding representing entities, and 2.2.3 on page 4 regarding attributes. It appears that your interpretation matches Date's (at least in `An Introduction to Database Systems` 8th edition chapter 14). I support Chen's interpretation which maps entities/sets to values/domains, not to tuples/relvars as Date suggests. I believe Date's argument re the First Great Blunder applies to entity sets as well, since mapping them to relvars contradicts the dictum that a relvar isn't a domain, while entity sets are clearly the domains of attributes and relationships. – reaanb Aug 09 '15 at 07:09
If you continue on to 2.3.2, you will see a nice conversion from conceptual to logical form for the entity EMPLOYEE. Figure 7 shows exactly what I have been saying. Chen's own description: "The whole table in Figure 7 is an entity relation, and each row is an entity tuple." The next stage would be the physical table. That differs only in that the logical attribute "Name" would be two fields First-Name and Last-Name. In any event, while this is interesting to me and, I assume, you, it still has nothing to do with OPs initial question. Or am I missing something? – TommCatt Aug 11 '15 at 02:37
You meant to help the OP by explaining how the conceptual level maps to the physical. I mean to help in a similar way, by explaining that the conceptual maps to the logical which maps to the physical. The mapping between the levels isn't one-to-one. – reaanb Aug 11 '15 at 17:21

Database Normalization chaining

3 Answers3

Linked