Normalising a database to reduce dependency

Question

I am trying to get my database design right. It is a large set of alcohol drinks consisting of beer, liquor, wine and so on. I could keep it all in a single table as follows:

id category brand     type    price  quantity  description  

1  Beer     Heineken  bottle  $2.00  100       some description...
2  Beer     Calsburg  bottle  $3.00  200       some description
3  Beer     Heineken  can     $1.00  300       some description....
4  Liquor   JWalker   bottle  $30.00 100       some descri...

Seems this is bad design considering repetitions for category and brand will occur. Thus I split it into 3 tables as follows:

Category Table

id   name(pk)

1    Beer
2    Liquor

Brand Table

id   name(pk)    category_name(FK)
1    Heineken    Beer
2    Carlsburg   Beer
3    Lindemans   Wine
4    JWalker     Liquor

Product Table

id(PK)     type    price    quantity    description       category_name(FK) brand_name(FK)
1          Bottle  $2.00    100         some description  Beer              Heineken

Thought this would be better normalised but the way I see it, hardly a difference from the first table. And I end with type repeatnig too since I can get repetitions on bottle, can and so on. So should I get a 4th table for that?

Trying to normalise and keep it as sensible as possible. Is there a better way to go about doing this?

The foreign key values should be the id of the row in the parent table, not the value. — duffymo, Jun 16 '16 at 14:13
I think it's fine to use the name as the PK if they are always unique, however, as OP pointed out... with no additional data associated with brand or category, there's not much advantage to splitting off the tables. — mhatch, Jun 16 '16 at 14:15
Say Heineken gets bought out/sold by/to another company and/or gets rebranded. Do you want to have to update all products or just one record in your brand table? A single table may be tempting but reduces flexibility long term (maybe not reduce but add's rework). Are you planning on this system growing changing? If so get as close to 3rd/4th as possible. ***If*** this is really all it's going to be and you don't anticipate changes then a single table will be "faster" just maintenance more difficult. — xQbert, Jun 16 '16 at 14:16
And that Types category should be a separate table, today Heineken makes beer in bottles, tomorrow they offer kegs in two sizes, next year they offer a special commemorative oversize bottle with cork for one week. Types should not only be a separate table but have valid from/to dates — kevinskio, Jun 16 '16 at 14:19
The product's table should only have the brand_id field, not the brand_name, nor category_name/ category_id. (The category relation is already found in the Brand table). It does appear as though you should have a 4th table with the type of container (bottle, can, keg, box, etc.) That being said, SO is not going to be able to teach you how to properly normalize a database design (The concept is too large.) — AgRizzo, Jun 16 '16 at 14:22
@duffymo I wanted to use the id just for indexing purposes. The values are unique thus thought to use them as PK which also makes it more readable. — user3050832, Jun 16 '16 at 14:29
That defeats the whole purpose of normalizing. If you want to read the values you do a JOIN on the primary key of the parent and the foreign key of the child. — duffymo, Jun 16 '16 at 14:30
@xQbert Just a bit divided cos this system is expected to grow but not anytime soon. Maybe in a year or more. I could stuff everything into a single table which makes it easy for now but trouble in future. Then again wondering if I should place the effort for something that may or may not expand in a year. — user3050832, Jun 16 '16 at 14:32
So cost benefit, quick and easy now, more pain later. or a little more pain now for less pain/rework later. Do you have the time now and willing to go though a bit more pain? if not, go quick easy and correct later. But if alot of coding going against this structure now, code it right now otherwise alot of rework later.. — xQbert, Jun 16 '16 at 14:36
The accepted answer is not normalizing. Normalization doesn't introduce new columns. You haven't given enough info to normalize; but by common sense and column names, your first table is in 5NF. You seem to think that repeated subrow values imply a need for normalization. They don't. (See my answer.) Please give the FDs that hold in your table, and any other cases where it is always the join of other tables . Please say when a row (id,...) goes in your tab.e. PS Can you clearly explain what you mean by "repetitions for category and brand"? — philipxy, Jun 17 '16 at 09:16
@philipxy ID's I'm not sure what you mean, nor am I sure the comment to which you are referring. — xQbert, Jun 17 '16 at 12:36
I mean your two comments on this question. I don't see how I can be clearer that "Adding ids has nothing to do with normalization". The question asks "Trying to normalise and keep it as sensible as possible. Is there a better way to go about doing this?". Normalization involves replacing a relation with others that always join to it. [It does not involve introducing new columns.](http://stackoverflow.com/a/32151278/3404097). Please see also my other comments on this page. — philipxy, Jun 17 '16 at 12:51
You seem to think that normalization involves replacing strings with numbers. It doesn't. — Mike Sherrill 'Cat Recall', Jun 23 '16 at 18:23
@ user3050832 & @duffymo As a start see the [Relational Design Overview](https://lagunita.stanford.edu/courses/DB/RD/SelfPaced/courseware/ch-relational_design_theory/seq-vid-relational_design_overview/) slides & (12") videos & others (eg [BCNF](https://lagunita.stanford.edu/courses/DB/RD/SelfPaced/courseware/ch-relational_design_theory/seq-vid-boyce-codd_normal_form/)) in the rest of the Stanford free online [Relational Design Theory](https://lagunita.stanford.edu/courses/DB/RD/SelfPaced/about) mini-course it's from. (Also at [Coursera](https://www.coursera.org/course/db) & other sites.) — philipxy, Jun 25 '16 at 10:37

score 0 · Answer 1 · answered Jun 16 '16 at 14:18

0

Brand Table
brandID(PK) BrandName

Category table
BrandID(FK) CategoryID(PK) Categoryname

Product table
ProductID(PK) CategoryID(FK) description price quantity

answered Jun 16 '16 at 14:18

Nick MacDonald

11
3

1

The category doesn't need to be related to the brand, BrandID belongs to the product as a foreign key. – Philipp Jun 16 '16 at 14:23

philipxy · Answer 2 · 2016-06-17T10:37:02.247

Normalization requires knowing functional dependencies (FDs) and join dependencies (JDs) that hold. You haven't given them. So we can't normalize. But guessing at your application and your table, it is in 5NF.

Presumably id is a unique column. So it functionally determines every column set. Since no smaller subset of {id} is unique, it is a candidate key (CK). Presumably no other FDs hold other than the ones that hold because of that CK. So the table is in 5NF.

But suppose also one more FD holds: that a given brand only ever appears with the same category. Then to normalize to 5NF column category should dropped and a new table should be added with brand & category columns and CK {brand}.

Or suppose that a brand has one or more categories, and instead of a row stating that category is its product's category, it states that category is a category of its product's brand. (Weird, since then for brands with more than one category the table wouldn't give a product's category.) Then normalization also gives those two tables, with new CK {category, brand}. But in this case it's because of a multi-valued dependency (MVD), ie because of a binary JD.

PS Introducing ids has nothing to do with normalization.

PPS You seem to think that repeated subrow values imply a need for normalization. They don't. Normalization is for sometimes replacing a table by tables that always join to it.

score -1 · Answer 3 · answered Jun 24 '16 at 11:33

Normalization through BCNF is based on functional dependencies. It's not based on whether a column contains text or numbers. You seem to think that, because the category column contains the word Beer more than once, it needs to be "normalized". That's not the case.

So what are the functional dependencies here?

id -> category, brand, type, price, quantity, description
category, brand, type, -> id, price, quantity, description

That second FD might be wrong. It might be that {brand, type} is the determinant. But I think it's likely that there's a company somewhere that makes both beer and liquor under the same brand name. So I think that the determinant is probably {category, brand, type}.

That's in 5NF already. "Splitting" isn't going to improve this table.

score -2 · Accepted Answer · answered Jun 16 '16 at 14:39

-2

Table creation would look something like this:

create table product (
   product_id int not null identity,
   brand_id int not null,
   category_id int not null,
   primary key(product_id),
   foreign key brand_id references brand(brand_id),
   foreign key category_id references category(category_id)
);

create table brand (
    brand_id int not null identity,
    name varchar(80),
    primary key(brand_id)
);

create table category (
   category_id int int not null identity,
   name varchar(80),   
   primary key(category_id)
);

You do a JOIN to get the record back:

select p.product_id, c.name as category_name, b.name as brand_name
from product as p
join category as c on p.category_id = c.category_id
join brand as b on p.brand_id = b.brand_id

answered Jun 16 '16 at 14:39

duffymo

305,152
44
369
561

I have no idea what you're talking about. This is an example to show what foreign keys would look like in this case. Maybe you should review what normalization means. – duffymo Jun 17 '16 at 09:14
I didn't add any new column names. I created two tables, one for category, and another for brand, and made the product table point to them with foreign keys. Seems classic to me. – duffymo Jun 17 '16 at 09:22
1

The question is "Trying to normalise and keep it as sensible as possible. Is there a better way to go about doing this?". The original table is in 5NF. The question's second design is not the product of normalization. It couldn't be anyway because it introduces new columns: ids. Ditto for your answer. One might want to add ids; but it isn't normalization. PS In case you just downvoted my answer, please tell me the first thing you think is wrong and I will add a reference to show my answer is correct. Or just read a textbook or slides (many online) about normalization. – philipxy Jun 17 '16 at 09:32
5NF? I see repeated data, which is what prompted the question in the first place. That's not what I think of when I hear 5NF. It's not even 1NF. – duffymo Jun 17 '16 at 09:34
1

I hope that you will try to find a textbook that disagrees with anything I've written. – philipxy Jun 17 '16 at 09:44
1

@duffymo: *"I didn't add any new column names."* Neither brand_id nor category_id are in the original table. – Mike Sherrill 'Cat Recall' Jun 23 '16 at 18:19
Example only. Think of it as a substitute: I took out the incorrect category FK column that had 'Beer' and replaced it with the correct category_id with a FK that points (correctly) to the primary key in the Category table. Do you have anything constructive to add to this conversation? Or are you happy to troll? – duffymo Jun 23 '16 at 18:22
1

*"Example only. Think of it as a substitute"* With respect to database normalization, it's just a mistake. – Mike Sherrill 'Cat Recall' Jun 24 '16 at 11:25
You're saying that repeating "Beer" in the category column and "Heineken" in the brand column is the correct way to normalize? Inserting keys from the other table is a mistake? – duffymo Jun 24 '16 at 12:02
1

@duffymo: I'm saying that introducing attributes that were not in the original relation is not part of normalization. You introduced two: brand_id and category_id. – Mike Sherrill 'Cat Recall' Jun 26 '16 at 22:46
I'll ask again: You think repeating "Beer" and "Heineken" as category and brand are the proper thing to do? A simple yes or no will do. – duffymo Jun 26 '16 at 23:14
1

@duffymo: [Mu](http://www.catb.org/jargon/html/M/mu.html). Normalization involves decomposition of a relation by taking projections. By definition, a projection cannot introduce either new attributes or new functional dependencies. You introduced two of each. (brand.brand_id->brand.name, and category.category_id->category.name) What you're doing isn't normalization. – Mike Sherrill 'Cat Recall' Jun 27 '16 at 11:24
Can't say yes or no can you? – duffymo Jun 27 '16 at 11:25
Far be it from me to cite Wikipedia, but it agrees with me: see the first and fourth paragraphs. https://en.wikipedia.org/wiki/Database_normalization – duffymo Jun 27 '16 at 11:33
1

@duffymo: Wikipedia doesn't agree with you. You didn't normalize the original relation; it doesn't *have* a unique ID for brand or category. To the original relation--which is probably already in 5NF--you added two new attributes and two new functional dependencies, which creates a new and different relation that's *not* in 5NF. It has two dependencies that don't exist in the OPs original relation. I can see that you grasp the pattern, but you're not grasping the principles. Focus on this principle: normalization doesn't introduce new attributes. *Now* normalize the OP's original relation. – Mike Sherrill 'Cat Recall' Jun 30 '16 at 12:33

Normalising a database to reduce dependency

4 Answers4