24

I have read different tutorials and seen different examples of normalization, specially the notion of "repeating groups" in the first normal form. From them I have have gathered that repeating groups are "kind-of" multi-valued attributes (e.g. here and here).

But we already make separate tables for each multi-valued attribute by including foreign keys from the parent table during the process of mapping an ERM (Entity relationship Model) to a RDM (Relational Data Model)? Reference: this

Secondly, are those "repeating groups" essentially laid out horizontally in the same row, or can the same value occurring in the same column again and again, i.e. the same value of an attribute again and again, also a repeating group and should be eliminated?

enter image description here In this example, the value English is repeating again and again. Is this a repeating group? If I eliminate it to make another table SUBJECT with Subject Name and Module_ID(Foreign key), this is what I get. Sure it gets rid of the repeating value, but I am not sure if this is the right thing. Is it right? enter image description here

Solace
  • 8,612
  • 22
  • 95
  • 183

2 Answers2

41

The term "repeating group" originally meant the concept in CODASYL and COBOL based languages where a single field could contain an array of repeating values. When E.F.Codd described his First Normal Form that was what he meant by a repeating group. The concept does not exist in any modern relational or SQL-based DBMS.

The term "repeating group" has also come to be used informally and imprecisely by database designers to mean a repeating set of columns, meaning a collection of columns containing similar kinds of values in a table. This is different to its original meaning in relation to 1NF. For instance in the case of a table called Families with columns named Parent1, Parent2, Child1, Child2, Child3, ... etc the collection of Child N columns is sometimes referred to as a repeating group and assumed to be in violation of 1NF even though it is not a repeating group in the sense that Codd intended.

This latter sense of a so-called repeating group is not technically a violation of 1NF if each attribute is only single-valued. The attributes themselves do not contain repeating values and therefore there is no violation of 1NF for that reason. Such a design is often considered an anti-pattern however because it constrains the table to a predetermined fixed number of values (maximum N children in a family) and because it forces queries and other business logic to be repeated for each of the columns. In other words it violates the "DRY" principle of design. Because it is generally considered poor design it suits database designers and sometimes even teachers to refer to repeated columns of this kind as a "repeating group" and a violation of the spirit of the First Normal Form.

This informal usage of terminology is slightly unfortunate because it can be a little arbitrary and confusing (when does a set of columns actually constitute a repetition?) and also because it is a distraction from a more fundamental issue, namely the Null problem. All of the Normal Forms are concerned with relations that don't permit the possibility of nulls. If a table permits a null in any column then it doesn't meet the requirements of a relation schema satisfying 1NF. In the case of our Families table, if the Child columns permit nulls (to represent families who have fewer than N children) then the Families table doesn't satisfy 1NF. The possibility of nulls is often forgotten or ignored in normalization exercises but the avoidance of unnecessary nullable columns is one very good reason for avoiding repeating sets of columns, whether or not you call them "repeating groups".

See also this article.

nvogel
  • 24,981
  • 1
  • 44
  • 82
  • Do you have any reference regarding first paragraph of your answer? – haccks Mar 31 '15 at 15:16
  • 1
    This [tutoral](http://rdbms.opengrass.net/2_Database%20Design/2.2_Normalisation/2.2.4_1NF%20Repeating%20Attributes.html) comes up with another definition for repeating group: *"A repeating group is a domain or set of domains, directly relating to the key, that repeat data across tuples in order to cater for other domains where the data is different for each tuple."* – haccks Mar 31 '15 at 16:31
  • 2
    [Codd1971](http://dl.acm.org/citation.cfm?id=1734716) "3.1 Repeating Groups... in CODASYL fashion a compound group schema which contains a repeating group schema --- see McGee's Chapter 2 in [Codasyl Systems Committee, Feature Analysis of Generalized Data Base Management Systems]". – nvogel Mar 31 '15 at 18:31
  • 1
    Not convinced by the example at opengrass.net. Is that really a "repeating group"? Either it is a picture of a table that isn't a relation at all. OR it is a picture of a relation that presumably satisfies 1NF and should have a composite key. {StudentId} clearly is not the key; {StudentId, UnitCode} might be in which case it appears to satisfy 1NF but not 2NF. – nvogel Mar 31 '15 at 18:42
  • @sqlvogel; In fact that website also claiming that the [first table](http://rdbms.opengrass.net/2_Database%20Design/2.2_Normalisation/2.2.4_1NF%20Repeating%20Attributes.html) is unnormalized. But as per other tutorials and [Wikipedia](http://en.wikipedia.org/wiki/First_normal_form#A_design_that_complies_with_1NF) that table is in 1NF as no repeating group in that table. This is not the only website which is claiming that, there is [another](http://www.sqa.org.uk/e-learning/SoftDevRDS02CD/page_12.htm) website which is claiming same. – haccks Mar 31 '15 at 20:46
  • AFAIK, steps to bring a table in 1NF: **1. Eliminate Repeating Groups.** **2. Make a separate table for each set of related attributes**, and **3. Give each table a primary key**. Most [tutorials](http://www.barrywise.com/2008/01/database-normalization-and-design-techniques/) or books says that a table is in 1NF by just considering the first step and [some](http://db.grussell.org/section008.html#_Toc67114447) of them takes consideration of 3rd step too. opengrass.net considers all three steps and it is convincing over other tutorials (if not considering the definition of repeating grp)...... – haccks Mar 31 '15 at 21:01
  • ....... and now it is frustrating, at least to me, to decide how to make a table in 1NF? – haccks Mar 31 '15 at 21:12
  • 2
    Easy answer: Don't worry too much about 1NF. The critical point is whether the table in question is an accurate representation of a relation (i.e. named and typed attributes; has a key; no nulls). – nvogel Mar 31 '15 at 21:45
  • @MikeSherrill'CatRecall' It's too bad that this answer doesn't address attributes that are (allegedly) "non-atomic", since that is probably actually meant nowadays when one uses/hears "repeating group". – philipxy Apr 10 '17 at 11:06
  • I was studying about this and I am wondering: If I have a table like Person(name, phone1, phone2) it will be a 1NF violation? I have the "same" column repeated (two phones) but on different columns (so not multi-value on a row) – Catarina Nogueira Jun 26 '22 at 14:59
3

the value English is repeating again and again. Is this a repeating group?

No. The multiple appearances of English in SUBJECT_MODULE are not a repeating group or even either of the two things that people mistakenly mean by a repeating group. They are also not evidence of redundancy or lack of normalization. Such multiple appearances might be connected to redundancy or normalization, but they appear all the time when there is no redundancy and various levels of normalization.

If SUBJECT_MODULE is rows where "[SUBJECT_NAME] has [MODULE_NAME] identified by [MODULE_ID]" and a subject might have more than one module then somewhere you must have multiple mentions of that subject (perhaps via its name) with mentions of different modules (perhaps by name or id). That would not involve redundancy.

Student Age Subject

Adam    15  Biology
Adam    15  Maths
Alex    14  Maths
Stuart  17  Maths

The redundancy in this example from your question's second "this" link is not that Adam appears in two rows or that Adam appears with 15 in two rows. It is that if the table is rows where "[Student] is [Age] years old and takes [Subject]" then Student (eg Adam) can appear in multiple rows but always appears with the same Age (eg 15). But if the table were rows where "[Student] has a friend [Age] years old in [Subject]" then the table could be fully normalized already.

Sure it gets rid of the repeating value, but I am not sure if this is the right thing.

It does for your example data, but it might not for other example data. You haven't told us enough. (Anyway as I said above the multiple appearances might not even need normalizing.)

Whether there are any normalization-relevant redundancies in SUBJECT_MODULE or even whether there are any valid decompositions including the one you gave depends on the usual information necessary to normalize to above 1NF. Namely whether some of its columns are functions of others (functional dependencies) and whether its rows are also those where "..." AND "..." (join dependencies).

By giving a possible decomposition you have said that it is also rows where "...[Subject_Name]...[Module_ID]..." AND "...[Module_Name]...[Module_ID]..." And you have given some example decomposition data. But we only know that it could be so decomposed because you added the decomposition. And the decomposition plus data still isn't enough for us to know whether it should be so decomposed.

I have read different tutorials and seen different examples of normalization, specially the notion of "repeating groups" in the first normal form.

"Repeating groups" are something from pre-relational databases and cannot possibly appear in a relational table (relation). They are like a named set of values that is like a field of a record but is not quite. A relational table is always in 1NF. Each column of a row has a single value of the column's type. A non-relational database is "normalized" to tables ie 1NF (first sense of "normalized") which gets rid of repeating groups. Then those tables/relations are "normalized" to higher normal forms (second sense of "normalized").

A relational table having multiple similar columns or having a column type with multiple similar parts are each just reminiscent of having a repeating group in a non-relational database. And the multiple columns and parts should become multiple rows in a separate table, just like the multiple members of a repeating group. But these problems have to do with relational quality of design, not repeating groups or normalization (in either sense) or being relational (ie being in 1NF).

Note that a non-relational database might itself have similar problems with multiple similar fields and/or named sets or with multiple similar parts of values of fields. Normalization to tables does not get rid of these when it gets rid of repeating groups.

Regardless of how they got into a relational design, removing them gives a "better" design. It is just because these design problems are reminiscent of repeating groups that people get confused and imagine that somehow a table could contain a repeating group. So the multiple similar columns and values with multiple similar parts (or the parts) get incorrectly called "repeating groups".

See this answer re "atomicity".

philipxy
  • 14,867
  • 6
  • 39
  • 83