6

I'm trying to build a database that will store information on movies.

    Title
    Plot
    Genre
    Rating
    Director

The only thing that is bothering me is that most films don't just have one genre and I'm struggling to figure out how to store this on a MySQL Database. At first I was thinking that I'll just have one table and store all the genres in one column, separating them by a comma and when I want to retrieve them separate them using PHP, but I'm not sure this is the best way as I think I would have trouble sorting and searching for a specific genre e.g. Horror when the collumn contains 'Horror, Thriller, Action'.

Praveen Prasannan
  • 7,093
  • 10
  • 50
  • 70
  • don't forget that films can have http://www.filmsite.org/subgenres.html – Neil McGuigan Jul 05 '13 at 05:17
  • Does this answer your question? [Is storing a delimited list in a database column really that bad?](https://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad) – philipxy Mar 04 '22 at 09:52

4 Answers4

12

I would suggest you should follow the following structure:

tablename: movies

movieid, title, plot, rating, director

> sample data:
> 
> 1 titanic Bollywood   10  James Cameron

tablename: genres

genreid, genre

> sample data:
>  1    Horror
>  2    Thriller
>  3    Action
>  4    Love

tablename: moviegenres

moviegenresid, movieid, genreid

> sample data:
> 1 1   2
> 2 1   4

And the query is:

select m.*,group_concat(g.genre)
from movies m inner join moviegenres mg
on m.movieid=mg.movieid
inner join genres g
on g.genreid=mg.genreid
group by m.movieid
;

See the fiddle

Praveen Prasannan
  • 7,093
  • 10
  • 50
  • 70
  • 1
    I would agree with this answer except for the creation of the arbitrary UIDs `genreID` and `movieID`. In a relational database, having an extra arbitrary key is costly because instead of maintaining and guaranteeing uniqueness of 1 natural key (e.g. `genre`), now you have to maintain 2 unique keys (`genre` AND `genreID`). Unless you have a very good reason to do this, it is unnecessary and costly. Avoiding this also means you no longer need that extra table `genres`, further reducing the cost of maintenance (read: increasing the speed) of a relational database. – Renel Chesak Jul 26 '18 at 12:10
10

What you are looking to model here is called a "many-to-many" relationship, and is very common when modelling "real world" categorisations.

There are many descriptions out there of how to work with such relationships including:

  • Praveen's answer here which is specific to your question.
  • http://en.wikipedia.org/wiki/Junction_table - the extra table linking two populations in many/may relationships is usually called an intersection table or junction table.
  • http://www.tomjewett.com/dbdesign/dbdesign.php?page=manymany.php which helpfully shows an example with the table and key/constraints design, a handy data representation diagram in case that isn't clear, and how the relationship is modelled and used in the application.
  • Any good database design book/tutorial will cover this somewhere.

Do not be tempted to skip the extra intersection table by storing multiple genres in one field for each film (a comma separated list for instance). This is a very common "anti pattern" that will cause you problems, maybe not today, maybe not tomorrow, but eventually. I recommend anyone working with database design give Bill Karwin's "SQL Antipatterns" (http://pragprog.com/book/bksqla/sql-antipatterns) a read. It is written in a way that should be accessible to a relative beginner, but contains much that those of us who should know better need reminding of from time to time (many-to-many relations, the list-in-a-field solution/problem, and what you should do instead, are one of the first things the book covers).

David Spillett
  • 1,401
  • 11
  • 21
5

This answer is an elaboration of my comment on @Praveen Prasannan's answer above.

I would eliminate the arbitrary surrogate keys movieID and genreID as a way to remove unnecessary overhead for the relational database. Since title and genre are natural unique keys, we should use them and not ask the database to maintain the uniqueness of extra, meaningless keys and tables (the genres table in the referenced answer). This should improve the speed and performance for large relational databases and is good practice.

tablename: movies

primary key: title

title, plot, rating, director

> sample data:
> Titanic  Bollywood   10  James Cameron

tablename: moviegenres

primary key: title, genre

title, genre

> sample data:
> Titanic   Thriller
> Titanic   Romance

This also makes queries a lot easier for both the user and the machine since you don't have to join an extra table to decode the genres by the arbitrary UID.

Renel Chesak
  • 577
  • 6
  • 16
0

it's a bit late, but I made a simple code from the top comment code above

select movieid, title, plot, rating, director, group_concat(genre)
from movies
natural join moviegenres
natural join genres;
AinulBedjo
  • 55
  • 2
  • 10