I'm trying to think on the most performant database schema for a specific data structure. There are two main entities: Courses and Themes. A Course is a collection of Themes. A Theme have fields like Videos, Resources and Video Total Time.
Visually representing this data structure:
- Course
|_ ID: 12345
|_ Themes: [A, B] (an array of UIDs)
- Theme A
|_ Courses: [12345,67890] (an array of UIDs)
|_ Videos: [1,2,3,4,5,7] (an array of UIDs)
|_ Resources: [10,11,12] (an array of UIDs)
|_ Video Total Time: 10000 (probably stored as seconds as tinyint field)
- Theme B
|_ Courses: [12345,98765] (an array of UIDs)
|_ Videos: [5,6,7,8] (an array of UIDs)
|_ Resources: [12,13,14] (an array of UIDs)
|_ Video Total Time: 20000 (probably stored as seconds as tinyint field)
What I'm trying to achieve is a database schema for two tables, one for Courses, and one for Themes. The idea would be to have a MySQL query that gets a Course and group all fields from the Themes. In other words, when I get the result of the MySQL query, (using PHP) I'll get an array or object like this:
Array(
'ID' => 12345
'themes' => [A,B]
'videos' => [1,2,3,4,5,6,7,8]
'resources' => [10,11,12,13,14]
'video_total_time' => 30000
)
So, the point is that they are two relational databases. When I send a query to the DB requesting data from the video, I need to pull data from all the themes, and merge them together.
Since I'm not an expert on SQL / MySQL, I'm trying to learn a little bit about it while I try to figure out:
1) What is the best database schema for these two entities? Courses and Themes? Thinking specially about performance
2) Can I get the final data all using SQL? Or should I pull some data from the database, and then parse the data with PHP? What is usually faster?
3) What is the best way to store the array of UIDs? As a string? Or there's a better way to store it?
The primary goal of this is performance. I have this kind of data in a different database schema, merged with thousands of other kinds of data (WP databases, wp_posts / wp_postmeta tables), but right now it's really slow to get the information I need.
Any tips and suggestions are more than welcome!
Edit: Solved!
It was a tough call to decide which answer suits best my needs, because @TimMorton's and @PaulSpiegel's answers lead us to the same path, but with slightly different approaches. Tim's answer is great to understand how to properly design database schemas, taking into account many-to-many relationships, and how to organize your queries. But since the main focus of this question is improve performance, Paul's answer is more focused on that, with specific details about primary keys and indexes (which are fundamental to improve performance of the queries).
Anyways, I learned a lot about designing a database schema. Here's the lessons I learned:
- Don' try to stuff everything into the same table: it is fundamental to identify the entities properly before defining which tables you need. I started with two tables, for Videos and Themes. But turns out that a proper DB schema for my specification includes tables for Videos and Resources.
- Don't store arrays into columns: use a proper strategy to define the relation between entities. If you have one-to-one or one-to-many relationships, use the entities IDs and foreign keys. If you have many-to-many relationships, then the proper design pattern is to create a dedicated table only to create relations between entities. This will allow you to use JOIN clauses into your queries to put all the data together.
- When you think about performance, think about INDEX: depending on your table structure, using either an index or a composite index can improve query performance.
- Don't try to get everything in one big query: you definitely can, but having separate queries for sections of the data you need (on my example, one query to get all themes for a course, one to get all videos for the course, one to get resources for the course) pays off with code organization and readability.
I don't know if I'm correct with everything above, but it's what I learned so far. Hope this helps someone else too.