1

I'm new to databases and trying to understand why a junction or association table is needed when creating a many-to-many relationship.

Most of what I'm finding on Stackoverflow and elsewhere describe it in either highly technical relational theory terms or it's just described as 'that's the way it's done' without qualifying why.

Are there any relational database designs out there that support having a many-to-many relationship without the use of an association table? Why is it not possible to have, for example, a column on on table that holds the relationships to another and vice a versa.

For example, a Course table that holds a list of courses and a Student table that holds a bunch of student info — each course can have many students and each student can take many classes.

Why is it not possible to have a column on each row in either table (possibly in csv format) that contains the relationships to the others in a list or something similar?

AdjunctProfessorFalcon
  • 1,790
  • 6
  • 26
  • 62

3 Answers3

2

In a relational database, no column holds more than a single value in each row. Therefore, you would never store data in a "CSV format" -- or any other multiple value system -- in a single column in a relational database. Making repeated columns that hold instances of the same item (Course1, Course2, Course3, etc) is also not allowed. This is the very first rule of relational database design and is referred to as First Normal Form.

There are very good reasons for the existence of these rules (it is enormously easier to verify, constrain, and query the data) but whether or not you believe in the benefits the rules are, none-the-less, part of the definition of relational databases.

Larry Lustig
  • 49,320
  • 14
  • 110
  • 160
  • It is possible to work around that constraint, but it's just a terrible idea basically? http://stackoverflow.com/questions/3070384/how-to-store-a-list-in-a-column-of-a-database-table – AdjunctProfessorFalcon Jan 02 '16 at 02:11
  • 3
    That's not "working around" a "constraint", it's *violating* a principle. There's nothing to stop you from stuffing a "list" of comma-separated values into a text column, just like there's nothing to stop you from giving every variable and function in your program a single-character name chosen from some incomprehensible section of the unicode character set. But I don't consider that working around the "constraint" that you're expected to use descriptive symbol names I would consider it deliberately shooting yourself in the foot. – Larry Lustig Jan 02 '16 at 02:31
1

I do not know the answer to your question, but I can answer a similar question: Why do we use a junction table for many-to-many relationships in databases?

First, if the student table keeps track of which courses the student is in and the course keeps track of which students are in it, then we have duplication. This can lead to problems. What if a student knows it is in a course, but the course doesn't know that it has that student. Every time you made a course change you would have to make sure to change it in both tables. Inevitably this will not happen every time and the data will become inconsistent.

Second, where would we store this information? A list is not a possible type for a field in a database. So do we put a course column in the student table? No, because that would only allow each student to take one course, a many-to-one relationship from students to courses. Do we put a student column in the courses table? No, because then we have one student in each course.

What does work is having a new table that has one student and one course per row. This tells us that a student is in a class without duplicating any data.

skyler
  • 1,487
  • 1
  • 10
  • 23
1

"Junction tables" come from ER/ORM presentations/methods/products that don't really understand the relational model.

In the relational model (and in original ER information modeling) application relationships are represented by relations/tables. Each table holds tuples of values that are in that relationship to each other, ie that are so related, ie that satisfy that relationship, ie that participate in the relationship.

A relationship is expressed independently of any particular situation as a predicate, a fill-in-the-(named-)blanks statement. Rows that fill in the named blanks to give a true statement from the predicate in a particular situation go in the table. We pick sufficient predicates (hence base tables) to describe every situation. Both many-to-1 and many-to-many application relationships get tables.

The reason why you don't see a lot of many-to-many relationships along with columns about the participants rather than about their participation in the relationship is that such tables are better split into ones about the participants and one for the relationship. Eg columns in a many-to-many table that are about participants 1. can't say anything about entities that don't participate and 2. say the same thing about an entity every time it participates. Information modeling techniques that focus on identifying independent entity types first then relationships between them tend to lead to designs with few such problems. The reason why you don't see many-to-many relationships in two tables is that that is redundant and susceptible to the error of the tables disagreeing. The problem with collection-valued columns (sequences/lists/arrays) is that you cannot generically query about their parts using usual query notation and implementation because the DBMS doesn't see the parts organized into a table.

See this recent answer or this one.

Community
  • 1
  • 1
philipxy
  • 14,867
  • 6
  • 39
  • 83