0

I need to make connections between several tables. And I need to decide whether I should make several separate tables with only two ids or one table with all the connections.

The tables are:

  • movie - 50000 rows
  • photo_set - 50000 rows - each photo set is related to a movie, but there can be some photosets without movies or movies without photosets
  • dvd - 13000 rows - there can be obly 3-5 movies on each DVD, some movies are not connected to any DVD
  • category - 50 rows - there can be 1-5 categories for each movie. Category for a all movies and their photosets in one DVD are the same
  • actor - 15000 rows - there can be 2-5 actors in one movie

The question is: how to calculate the number of rows in a table with such fields?

  • movie_id (default=NULL)
  • photo_set_id (default=NULL)
  • dvd_id (default=NULL)
  • category_id
  • actor_id

I think that if I will use only one table with all the connections it might be much faster to get all the connections in one query. Of course if this one table will be not very much larger than several connection tables.

If this is not a good idea to use one table, please tell me, what will be the advantage of using several connection tables in my scenario?

Stedy
  • 7,359
  • 14
  • 57
  • 77
Eugene
  • 655
  • 7
  • 16

3 Answers3

2

You should make several connection tables, one for each many-to-many relationship.

The reason for having several connection tables is the same as for having several data tables. You could store movies, photos and so on in the same table. But you don't, because when designing your database you strive for the third normal form. Each kind of object should have it's own table.

The relation between movie and actor is not the same as the relation between movie and photoset. Let's say that you want to add attributes to the relationships. Maybe you want to store which character the actor played in he movie, or maybe you want to name the photosets ("Pictures from the Oscar awards", "Screenshots from the movie" etc).

My point is that it will be easier to extend the application if you make one table per relationship.

Estimating the number of rows:

The number of rows in each table will be the number of rows in the driving table times the average number of connected items.

Example: Let's assume the average number of actors is 3 per movie. Then there will be 50000 * 3 = 150000 rows in the movie_actor table.

Klas Lindbäck
  • 33,105
  • 5
  • 57
  • 82
  • But what will be the advanage of using serveral connection tables in my scenario? I think that if I will use only one table with all the connections it might be much faster to get all the connections in one query. Ofcourse if this one table will be not very much larger than several connection tables. – Eugene Nov 11 '11 at 10:02
  • What about indexes? If you put them all in the same table it will be hard to define an index that is good for all types at the same time. If they are in separate tables the optimiser can use different indexes for each connection type. – Klas Lindbäck Nov 11 '11 at 11:26
  • I think all queries will be always only for by one field. So there can be one Unique index for all fields and one index for each field. I don't know if this will be a problem for an optimizer. – Eugene Nov 11 '11 at 12:17
  • I have added the reason for why you should have several tables in my answer, and it's not performance. – Klas Lindbäck Nov 11 '11 at 15:58
0

For calculation, you can use averages:

If you have on average 2,5 actor per movie your actor-per-movie table will have 50k * 2,5 = 125k rows

The largest one might be movie<>photo_set/

Well, judging from these number any decent RDBMS will not give you any trouble when it comes to the number of records in the tables. So you can go for the schema you find the most convenient.

I would use several ones. It makes your design more flexible and easier to extend (if you want to maintain information about the connections. It also has advantages indeed for indexing like Klas wrote.

Pleun
  • 8,856
  • 2
  • 30
  • 50
  • Actually movie<>photo_set is almost one to one connection. A movie will always have maximum one photoset and photoset can have maximum one movie. It will be quite rare situation when a movie don't have a a photoset or photoset don't have a movie. I don't fear there can be too many rows that it will be not supported by RDBMS, I just want to know whether it will be more effective to use one table of connections insted of serveral. – Eugene Nov 11 '11 at 10:21
0

In MySQL, the max number of rows is limited by the tables size and MAX_ROWS table option. The table size depends on the OS and file system Hence You need not to worry about the number of rows you have now.

Win32 w/ FAT/FAT32 - 2GB/4GB
Win32 w/ NTFS - 2TB (possibly larger)
Linux 2.2-Intel 32-bi - 2GB (LFS: 4GB)
Linux 2.4+(using ext3 file system) 4TB
Solaris 9/10 - 16TB
MacOS X w/ HFS +2TB

MAX_ROWS table option is software limit and that can be change later on. If you are about to store a lot of rows set MAX_ROWS accordingly.Further You Can Visit

StackOverFlow Link

Full Table

One Large Table Vs few Small Tables

Community
  • 1
  • 1
ScoRpion
  • 11,364
  • 24
  • 66
  • 89