1

I'm trying to think on the most performant database schema for a specific data structure. There are two main entities: Courses and Themes. A Course is a collection of Themes. A Theme have fields like Videos, Resources and Video Total Time.

Visually representing this data structure:

- Course
|_ ID: 12345
|_ Themes: [A, B] (an array of UIDs)

- Theme A
  |_ Courses: [12345,67890] (an array of UIDs)
  |_ Videos: [1,2,3,4,5,7] (an array of UIDs)
  |_ Resources: [10,11,12] (an array of UIDs)
  |_ Video Total Time: 10000 (probably stored as seconds as tinyint field)
- Theme B
  |_ Courses: [12345,98765] (an array of UIDs)
  |_ Videos: [5,6,7,8] (an array of UIDs)
  |_ Resources: [12,13,14] (an array of UIDs)
  |_ Video Total Time: 20000 (probably stored as seconds as tinyint field)

What I'm trying to achieve is a database schema for two tables, one for Courses, and one for Themes. The idea would be to have a MySQL query that gets a Course and group all fields from the Themes. In other words, when I get the result of the MySQL query, (using PHP) I'll get an array or object like this:

Array(
  'ID' => 12345
  'themes' => [A,B]
  'videos' => [1,2,3,4,5,6,7,8]
  'resources' => [10,11,12,13,14]
  'video_total_time' => 30000
)

So, the point is that they are two relational databases. When I send a query to the DB requesting data from the video, I need to pull data from all the themes, and merge them together.

Since I'm not an expert on SQL / MySQL, I'm trying to learn a little bit about it while I try to figure out:

1) What is the best database schema for these two entities? Courses and Themes? Thinking specially about performance

2) Can I get the final data all using SQL? Or should I pull some data from the database, and then parse the data with PHP? What is usually faster?

3) What is the best way to store the array of UIDs? As a string? Or there's a better way to store it?

The primary goal of this is performance. I have this kind of data in a different database schema, merged with thousands of other kinds of data (WP databases, wp_posts / wp_postmeta tables), but right now it's really slow to get the information I need.

Any tips and suggestions are more than welcome!


Edit: Solved!

It was a tough call to decide which answer suits best my needs, because @TimMorton's and @PaulSpiegel's answers lead us to the same path, but with slightly different approaches. Tim's answer is great to understand how to properly design database schemas, taking into account many-to-many relationships, and how to organize your queries. But since the main focus of this question is improve performance, Paul's answer is more focused on that, with specific details about primary keys and indexes (which are fundamental to improve performance of the queries).

Anyways, I learned a lot about designing a database schema. Here's the lessons I learned:

  • Don' try to stuff everything into the same table: it is fundamental to identify the entities properly before defining which tables you need. I started with two tables, for Videos and Themes. But turns out that a proper DB schema for my specification includes tables for Videos and Resources.
  • Don't store arrays into columns: use a proper strategy to define the relation between entities. If you have one-to-one or one-to-many relationships, use the entities IDs and foreign keys. If you have many-to-many relationships, then the proper design pattern is to create a dedicated table only to create relations between entities. This will allow you to use JOIN clauses into your queries to put all the data together.
  • When you think about performance, think about INDEX: depending on your table structure, using either an index or a composite index can improve query performance.
  • Don't try to get everything in one big query: you definitely can, but having separate queries for sections of the data you need (on my example, one query to get all themes for a course, one to get all videos for the course, one to get resources for the course) pays off with code organization and readability.

I don't know if I'm correct with everything above, but it's what I learned so far. Hope this helps someone else too.

Marcin Orlowski
  • 72,056
  • 11
  • 123
  • 141
Diego de Oliveira
  • 681
  • 1
  • 8
  • 23
  • It would be a mistake to think of this as two tables. I see themes, courses, videos, and resources. A clue is that any time you see an array, think table. For example, videos: a video would have in id , title, and running time. You wouldn’t need a field called video total time; it would simply be summed in your query. – Tim Morton Mar 22 '19 at 20:55
  • Hey @TimMorton thanks for the comment. So you think that the most performant way to get all the data that I need is to break down the information into multiple tables? The idea of having the video_total_time field for themes, would be to get this information faster. But since I'm a newbie on SQL / MySQL, maybe querying the database for all videos on the theme, and them do a sum of all video times, would not be as slow as I think? – Diego de Oliveira Mar 22 '19 at 21:05
  • See [is-storing-a-delimited-list-in-a-database-column-really-that-bad](https://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad). You might then realize that you are designing a performance disaster. – Paul Spiegel Mar 22 '19 at 21:51
  • Hey @PaulSpiegel thanks for the link, really informative. Like I said, I'm a newbie on designing a proper database schema. How would you do these relational fields? Should I build a relational table for each combination? IOW: one table for course <=> themes, one for theme <=> videos, one for theme <=> resources? – Diego de Oliveira Mar 22 '19 at 22:04
  • 1
    A normalized schema would have one table per entity and one table per many-to-many relation. One-to-many relations are solved with a reference (foreign key) in the child table (so no additional tables for those relations). As far as I see, you have no many-to-many relations. So you'd need four tables: `courses`, `themes`, `video` and `ressources`. – Paul Spiegel Mar 22 '19 at 22:23
  • If I'm reading this right (based on the desired result), a course can have many themes, a course can have many videos, a course can have many resources. However, looking at the visual representation of data, a theme can also have many videos, resources, and courses. That would mean a many to many relationship between themes and courses, wouldn't it? And then the question is: are videos tied to a theme or to a course, or both? Likewise with resources... – Tim Morton Mar 22 '19 at 22:42
  • @TimMorton you're right, it's a many to many relationship. A course can have many themes, and a theme can have many courses. Also, a theme can have many vídeos and resources, and a vídeo/resource can have many themes. I'll update the visual representation to mais this more clear. – Diego de Oliveira Mar 23 '19 at 03:47
  • I’ll update my answer as I have time, although between Paul’s and my answer, you should have enough info to extrapolate what you need. – Tim Morton Mar 23 '19 at 15:36

3 Answers3

1

In it's simplest form, assuming no many to many relationships:

Course                Theme
--------              --------
CourseID <--+         ThemeId
Name        |         Name
            +------   CourseID
            |
            |      
            |         Video
            |         --------
            |         VideoID
            |         Name
            |         Length
            +------   CourseID
            |
            |
            |         Resource
            |         --------
            |         ResourceID
            |         Name
            +------   CourseID

In this form, a Course can have many themes, many videos, and many resources; but each theme, video, and resource can have only one course.

However, I don't think that's how you want it.

I would lean more towards

                      Course             Theme
                      --------           --------
            +---->    CourseId    +--->  ThemeId
            |         Name        |      Name
            |         ThemeId ----+      
            |
            |      
            |         Video
            |         --------
            |         VideoID
            |         Name
            |         Length
            +------   CourseID
            |
            |
            |         Resource
            |         --------
            |         ResourceID
            |         Name
            +------   CourseID

This allows a course to have only one theme, but many videos and resources. This allows the themes to have more than one course.

But it still doesn't quite fit the bill...

This one allows many courses to share the same theme, as well as have more than one theme:

                      Course         Course_Theme      Theme
                      --------       ------------      --------
            +---->    CourseId <----- CourseId   +-->  ThemeId
            |         Name            ThemeId ---+     Name
            |         ThemeId       
            |
            |      
            |         Video
            |         --------
            |         VideoID
            |         Name
            |         Length
            +------   CourseID
            |
            |
            |         Resource
            |         --------
            |         ResourceID
            |         Name
            +------   CourseID

As this stands now, each course can have many themes, videos, and resources. Each theme can have many courses. Each video and resource belongs to a course (i.e., can have only one course)

If a video or resource can be for more than one course, then you'll have to expand it just as I did with themes.


As per comment, everything is many to many. Notice I don't have any direct relations between themes and videos nor themes and resources. I don't think they will be necessary; you should be able to pick up what you need going through courses.

          Course         Course_Theme        Theme
          --------       ------------        --------
+---->    CourseId <---- CourseId                   
|         Name           ThemeId ----------> ThemeId
|                                            Name
|
|                        Course_Video        Video
|                        ------------        --------
+----------------------  CourseId                      
|                        VideoId ----------> VideoId
|                                            Name         
|                                            Length             
|                                                         
|                        Course_Resource     Resource
|                        ---------------     --------     
+----------------------- CourseId                        
                         ResourceId -------> ResourceId   
                                             Name         
                                             Url, etc.    

Now for the queries. Although it is possible to use aggregate functions along with group by, I think it makes far more sense to keep it simple and just pull things out one at a time.

Themes per course
SELECT T.* 
FROM COURSE C
INNER JOIN COURSE_THEME CT ON CT.COURSEID=C.COURSEID
INNER JOIN THEME T ON CT.THEMEID=T.THEMEID 
WHERE {insert your search conditions on course}

or, if you know CourseId:

SELECT T.*
FROM THEME T
INNER JOIN COURSE_THEME CT ON T.THEMEID = CT.THEMEID
WHERE CT.COURSEID = ?


likewise,

Videos per course
SELECT V.*
FROM COURSE C
INNER JOIN COURSE_VIDEO CV ON CV.COURSEID=CV.COURSEID
INNER JOIN VIDEO ON CV.VIDEOID=V.VIDEOID
WHERE {insert your search conditions on course}

or, if you know the CourseId:

SELECT V.*
FROM VIDEO V
INNER JOIN COURSE_VIDEO CV ON CV.VIDEOID = V.VIDEOID
WHERE CV.COURSEID = ?

to select the sum of the video lengths per course,

SELECT SUM(LENGTH) AS TOTAL
FROM VIDEO
INNER JOIN COURSE_VIDEO CV ON CV.VIDEOID = V.VIDEOID
WHERE CV.COURSEID = ?
GROUP BY CV.COURSEID

Now, the tricky part is videos per theme.  I am making an assumption here:  the set of videos per theme is the same as the set of videos per course per theme.

The long way around:

SELECT V.*
FROM VIDEO V
INNER JOIN COURSE_VIDEO CV ON VIDEO.VIDEOID = CV.VIDEOID
INNER JOIN COURSE C ON COURSEID = CV.COURSEID
INNER JOIN COURSE_THEME CT ON C.COURSEID = CT.COURSEID
INNER JOIN THEME T ON CT.THEMEID = T.THEMEID
WHERE THEMEID = ?

Blech. You can cut out the middlemen:

SELECT V.*
FROM VIDEO V
INNER JOIN COURSE_VIDEO CV ON VIDEO.VIDEOID = CV.VIDEOID
INNER JOIN COURSE_THEME CT ON CV.COURSEID = CT.COURSEID
WHERE CT.THEMEID = ?

When you have your tables normalized, you can get any piece of information from whatever starting point you choose. FWIW, your example is a fairly complicated one since everything is many to many relations.


Update

Even though I had courses as the root, even when themes are the root things don't change much:

          Theme          Course_Theme        Course
          --------       ------------        --------
+---->    ThemeId <----  ThemeId                   
|         Name           CourseId ---------> CourseId
|                                            Name
|
|                        Theme_Video         Video
|                        ------------        --------
+----------------------  ThemeId                      
|                        VideoId --------->  VideoId
|                                            Name         
|                                            Length             
|                                                         
|                        Theme_Resource      Resource
|                        --------------      --------     
+----------------------- ThemeId                        
                         ResourceId ------>  ResourceId   
                                             Name         
                                             Url, etc.    

In this configuration, courses have videos and resources through ThemeId, i.e.:

SELECT V.*
FROM COURSE_THEME CT 
INNER JOIN VIDEO_THEME VT ON VT.THEMEID = CT.THEMEID
INNER JOIN VIDEO V ON V.VIDEOID = VT.VIDEOID
WHERE CT.THEMEID = ?
Tim Morton
  • 2,614
  • 1
  • 15
  • 23
  • TimMorton great, I really like how you detailed everything. But @PaulSpiegel answer got It right about themes. They are actually the main piece of the puzzle. A Theme is a collection of Videos and Resources. And a Course is a collection of Themes. On this project, you build a Course by choosing a group of Themes. Here you connected Vídeos and Resources directly to Courses. Sorry, it's my fault, I should have been done a better job explaining everything. Can you update your answer with these requirements? – Diego de Oliveira Mar 24 '19 at 01:29
1

Creating the schema

Step 1: Identify entities and their attributes

  • Course (ID, title, description)
  • Theme (ID, title, description)
  • Video (ID, title, description, duratation)
  • Ressource (ID, title, url)

Step 2: Identify relations

  • Theme => Course
  • Video => Theme
  • Ressource => Theme

Step 3: Create tables

  • courses
    • ID (PK)
    • title
    • description
  • themes
    • ID (PK)
    • course_id (FK)
    • title
    • description
  • videos
    • ID (PK)
    • theme_id (FK)
    • title
    • description
    • duratation
  • ressources
    • ID (PK)
    • theme_id (FK)
    • title
    • url

If themes can share videos and ressources, then it would be many-to-many relations. In this case you would need separate tables for those relations. Remove the theme_id column from videos and ressources and add the following tables:

  • themes_videos
    • theme_id (PK) (FK)
    • video_id (PK) (FK)
  • themes_ressources
    • theme_id (PK) (FK)
    • ressource_id (PK) (FK)

Here you should define composite primary keys on (theme_id, video_id) and (theme_id, ressource_id). Also create reverse indexes on (video_id, theme_id) and (ressource_id, theme_id).

Retrieving data

Assuming you know the ID of the course (which is 123), you can then retrieve the related data (from the many-to-many schema) with the following queries (which you execute one by one):

select c.*
from courses c
where c.id = 123;

select t.*
from themes t
where t.course_id = 123;

select distinct v.*
from themes t
join themes_videos tv on tv.theme_id = t.id
join videos v on v.id = tv.video_id
where t.course_id = 123;

select distinct r.*
from themes t
join themes_ressources tr on tr.theme_id = t.id
join ressources r on r.id = tr.ressource_id
where t.course_id = 123;

Then compose your array/object from retrieved data in PHP.

Performance

Trying to get all data with a single SQL query is not always a good idea. You just make your code and schema too complicated. Executing a couple of queries is not the end of the world. What you should avoid, is running executing a query in a loop (like: for each theme select related videos).

Paul Spiegel
  • 30,925
  • 5
  • 44
  • 53
  • Good tip on performance. It *is* possible to get it all in one query (assuming MySQL has aggregate functions), but it would be... hairy. Better to keep it simple so you can explain it in the code for the next guy. – Tim Morton Mar 23 '19 at 00:07
  • @paulSpiegel thanks for the response, very helpful! Can I ask some clarification about the many-to-many tables? For example, in `themes_videos`, you say that both `theme_id` an `video_id` are primary keys and also foreign keys, right? So, should I set the table to add these as primary keys, and ALSO create the composite primary key? Also, I heard that reversed indexes only works for MySQL >= 8.0. If I'm using a older version (like 5.x), is it possible to create the reverse index? – Diego de Oliveira Mar 26 '19 at 06:03
  • @DiegodeOliveira I mean **one** primary key (per table) with two columns. Eg. `PRIMARY KEY (theme_id, video_id), INDEX (video_id, theme_id)`. With reverse index I don't mean `DESC`, but reversed column order. – Paul Spiegel Mar 27 '19 at 14:43
  • @PaulSpiegel oh, got it! Thanks for the explanation! Now it makes more sense to me! – Diego de Oliveira Mar 27 '19 at 22:24
-1

Table Structure

Make tables like image as shown and use json encode/decode time of input/out. In the query you can have total time from the table.

Community
  • 1
  • 1
Kashem Ali
  • 39
  • 3
  • 1
    I personally would normalize it more than that. You're saving foreign keys in a varchar, presumably as a csv? That's not at all useful in a relational database. – Tim Morton Mar 22 '19 at 21:38