What kind of normalization rule does this violate?

Question

Suppose I have two tables on a database, T₁₀ and T₁₁, having 10 and 11 columns, respectively, where 10 of the columns are exactly the same on both.

What (if any) normalization rule am I violating?

this wouldn't happen to be something like: TableX with 10 columns and TableXHistoryLog with 10 columns + ChangeDate column (11 columns)?? — KM., Mar 04 '10 at 18:44

Alison R. · Accepted Answer · 2012-01-25T15:27:29.917

9

Edit: I have been informed that no Normal Forms are violated here in theory. Since this was the accepted answer, I'm leaving it here for reference, and because thinking about 3NF may in practice help avoid situations like that in the question.

You are violating the Third Normal Form (3NF), because if mostly the same data is held in both tables, then every attribute of each table is not directly dependent on the key of its respective table.

edited Jan 25 '12 at 15:27

answered Mar 04 '10 at 18:39

Alison R.

4,204
28
33

1

What if all 10 columns are the key? – DVK Mar 04 '10 at 18:45
I would think that the only way you'd be in 3NF, if all ten columns are the key in the first table, would be if all eleven columns were the key in the second, so we'd be talking about different entities, not the same entity (table 1) vs. the same entity (table 2) plus one attribute. – Alison R. Mar 04 '10 at 18:48
Thanks to all that responded, it really seems to be a case of 3rdNF violation. – Otávio Décio Mar 04 '10 at 19:13
If all 10 columns were keys then you would not need two tables anyways because they are still duplicates in both tables. – Jonathan Czitkovics Mar 05 '10 at 00:14
2

@AlisonR.: would you expand on this, please? From my understanding of 3NF (indeed, any normal form), it only applies to attributes in the same relation, so having both tables in the same database won't affect whether each is in 3NF. – outis Jan 09 '12 at 10:45
@outis, I would respectfully disagree with that statement. The process of normalization necessarily means considering whether data should belong to one relation or another. The very definition of 3NF says that the attributes of the table should refer to its key and only its key. If they refer to the key of another table, then it's not 3NF. – Alison R. Jan 24 '12 at 16:44
1

@AlisonR.: I still don't see it. From my understanding, the definition is based on dependencies (rather than referring), and the definition of "dependency" only addresses attributes within the same relation. Would you have time to discuss it in [chat](http://chat.stackexchange.com/rooms/2290/rdb)? – outis Jan 24 '12 at 19:36
@outis, why not pose a question at http://dba.stackexchange.com/? (And post a link here, so we can participate and/or follow along.) – Alison R. Jan 24 '12 at 20:48
@AlisonR.: I'm hesitant because it's (almost?) a dup of this one. I suppose it's a "why", not a "which", which might be enough. Perhaps I'll try, but with the apprehension that it may get closed. – outis Jan 24 '12 at 21:02
@AlisonR.: one quick question that may clarify things: are you talking from a standpoint of practice, rather than theory? That could be the reason we have different understandings of 3NF. If so, do you have a reference for the definitions you're using? Everything I've been able to find has either defined 3NF purely in terms of theory or doesn't give an exact definition, and (when it comes to practice) only uses examples of separating a single table into two. – outis Jan 24 '12 at 21:49
@outis Yes, I speak in terms of practice. I think where this discussion differs from the theory, as you briefly mentioned, is that the theory approaches normalization from the standpoint of taking a single heap of data and dividing it into multiple tables. This discussion is about going in the opposite direction, which if one had correctly followed the theory in the first place, wouldn't be necessary, and so the theory doesn't directly address it. All my references in this case came from the Wikipedia entries on normalization, which I already linked to. – Alison R. Jan 24 '12 at 22:04
2

@AlisonR. No NF is violated. The design could arise in 5NF or even 6NF tables. You are I think talking about "Normalization by Decomposition", which is just one procedure for arriving at a design in some given NF. Decomposition is not the only method and isn't a requirement of normalization theory. Synthesis is one alternative. – nvogel Jan 25 '12 at 06:48

outis · Answer 2 · 2012-03-25T20:33:33.690

Believe it or not, duplicating columns across tables violates no theoretical normal form in and of itself. Except for domain/key normal form (DKNF), normal forms are defined in terms of individual, not multiple, tables. DKNF is defined in terms of constraints, of which there are none in the general case. Thus, if there's a violation of a normal form:

it must be specific to one of the tables and exists independently of having both tables (i.e. the table would still violate the normal form even if you removed the other table), or
the relation has a constraint that violates DKNF, which means it isn't an example of the general case outlined in the question but a more specific case. It's not the duplicate columns that create the violation but instead the additional constraint on the extra column.

Consider the normal forms, using the brief definitions from the Wikipedia article:

1NF: The table faithfully represents a relation and has no repeating groups.

This one's fairly straight forward. The term "repeating groups" has multiple meanings in the theory, but none of them have anything to do with duplicate columns or data.
2NF: No non-prime attribute in the table is functionally dependent on a proper subset of any candidate key.

Here, the important term to examine is "functional dependence". Essentially, a functional dependence is where you project a relation to two columns, X and Y, and wind up with a function X → Y. You can't have a functional dependency between two (or more) tables^*. Additionally, candidate keys can't span multiple tables.
3NF: Every non-prime attribute is non-transitively dependent on every candidate key in the table.

Transitive dependence is defined in terms of functional dependence: a transitive dependency is a dependency where X → Z only because X → Y and Y → Z. X, Y and Z must be in the same table because these are functional dependencies.
4NF: Every non-trivial multivalued dependency in the table is a dependency on a superkey.

Multivalued dependency is a little trickier, but it can be illustrated with an example: "whenever the tuples (a,b,c) and (a,d,e) exist in r, the tuples (a,b,e) and (a,d,c) should also exist in r" (where "r" is a table). Most importantly for the matter at hand, a multivalued dependency only applies to a single table.
5NF: Every non-trivial join dependency in the table is implied by the superkeys of the table.

A table has a join dependency if it can be expressed as the natural join of other tables. These other tables, however, don't need to exist in the database. If table T₁₁ in the example had a join dependency, it would still have one even if you removed table T₁₀
6NF (C. Date): Table features no non-trivial join dependencies at all (with reference to generalized join operator).

Same reasoning for 5NF.
Elementary Key Normal Form (EKNF): Every non-trivial functional dependency in the table is either the dependency of an elementary key attribute or a dependency on a superkey.

Same reasoning for 2NF.
Boyce–Codd Normal Form (BCNF): Every non-trivial functional dependency in the table is a dependency on a superkey.

Same reasoning for 2NF.
Domain/Key Normal Form (DKNF): Every constraint on the table is a logical consequence of the table's domain constraints and key constraints.

If T₁₁ has a constraint that depends on T₁₀, then it's either a key constraint or a more complex constraint that still refers to T₁₀. The latter case is not the general case mentioned in the question. In other words, while there could be specific schemata with duplicate columns that violate DKNF, it's not true in general. Furthermore, it's the constraint (not the normal form) that's defined in terms of multiple tables and the constraint (not the column duplication) that causes DKNF violation.

The purpose for normalization includes preventing anomalies. However, normalization isn't complete in that it doesn't guarantee a relational database will be completely free from anomalies. This is one instance where practice diverges from theory.

If this still doesn't convince you, consider the schema KM.'s comment hints at, where T₁₁ represents a history (or versioned) version of T₁₀. The primary key of T₁₁ consists of the primary key columns held in common with T₁₀, plus the extra column (the date/version column). That T₁₁ has different candidate keys makes all the difference between an anomaly-prone and anomaly free, normalized design.

^*Someone might think that you could use joins to create a dependency between two tables. While a join might create a table that has a dependency, the dependency exists on this table, not between the constituents of the join. In the case at hand, this again means that one of the tables would be a joined table and would suffer from the dependency itself, irrespective of the other table in the database.

DKNF is something that a table is in or not in regardless of multi-table constraints. "Intuitively, a relation schema is in DK/NF if every constraint can be inferred by simply knowing the set of attribute names and their underlying domains, along with the set of keys." "An *X-constraint* (or simply a *constraint* if X is understood) is a mapping from the collection of all X-relations into {TRUE, FALSE}." Read Fagin's paper. — philipxy, Sep 15 '15 at 06:04
Also DKNF is not in the sequence of NFs leading to 5NF & 6NF. It is just not relevant to such decomposition based on NFs & JDs. PS It turns out that normalization to higher NFs is not just about eliminating update anomalies, because they disappear as of ETNF between 4NF & 5NF. — philipxy, Sep 27 '19 at 03:20

score 4 · Answer 3 · answered Mar 04 '10 at 18:37

4

Perhaps the rule of avoiding redundant data? (i.e. the same data in two tables)

answered Mar 04 '10 at 18:37

Ta01

31,040
13
70
99

score 4 · Answer 4 · answered Mar 04 '10 at 18:38

4

if 10 of the 11 columns are the same, why can't this just be one table, where the 11th column is left blank (along with a possible 12th column to denote which type of data it is, i.e. which table it would have been in originally)?

answered Mar 04 '10 at 18:38

GSto

41,512
37
133
184

Because storing 5 values in 100M row table is not a Good Idea storage-wise, as an example – DVK Mar 04 '10 at 18:40
If the two tables represent differant "things" it would be a bad idea. Each table should represent one thing or idea. – redcayuga Mar 04 '10 at 21:27
@redcayuga it depends though. if they're really two different things, why are 10 of the 11 columns exactly the same? Can you think of an example where this would be the case? – GSto Mar 04 '10 at 21:32

score 4 · Answer 5 · answered Mar 04 '10 at 18:41

It depends what's in the tables.

If no records are related to each other (for instance, if one table is simply archived records originating in but removed from the first table) you're not violating any rules.

But if those are the same records in each table, you have a dependency problem — that eleventh column is dependent only on the key value from the record, not the additional columns. Assuming that all ten columns are not involved in the primary key, you've violated 3rd NF.

nvogel · Answer 6 · 2014-12-29T06:24:35.730

3

Having two identical or near-identical relations is not itself a violation of any of the usual Normal Forms. Outis has very comprehensively explained why. It might well be a violation of the Principle of Orthogonal Design however, which is another aspect of relational database design theory.

edited Dec 29 '14 at 06:24

answered Jan 25 '12 at 06:32

nvogel

24,981
1
44
82

score 0 · Answer 7 · answered Mar 04 '10 at 18:44

0

If all 10 columns are part of your key, then Second Normal Form: Eliminating Redundant Data. Specifically, this falls under "Nonsurrogate Versus Surrogate Primary Keys" dilemma - to be honest, I don't recall either of those two choices to be "violating" 2NF, but the surrogate key is definitely closer to the spirit of 2NF

answered Mar 04 '10 at 18:44

DVK

126,886
32
213
327

A table may or may not satisfy 2NF whether it has a surrogate key or not. I don't see that surrogate keys are closer to the "spirit" of 2NF. – nvogel Mar 27 '12 at 14:34

score 0 · Answer 8 · answered Mar 04 '10 at 18:45

0

Only primary keys may be redundant between tables. Having any amount of non primary key columns in multiple tables violates third normal form.

answered Mar 04 '10 at 18:45

Jonathan Czitkovics

1,642
11
11

What kind of normalization rule does this violate?

8 Answers8