0

I read many articles here. There are many similar questions but I cannot find any one suit my condition. Forgive me as I am a new website developer and I code in ugly way.

I use Laravel 8 and MySQL InnoDB to handle my database. My website is a video information site in multiple Language. Here is my table structure.

My Tables:

videos

  • id (int)
  • code (varchar)
  • name (varchar)
  • date (date)
  • duration (int)
  • director_id (varchar)
  • genre_id (varchar) [e.g. 68#10#185#237#89#340#156]

videos_genres

  • genre_id
  • genre_tw
  • genre_en
  • genre_ja

videos_preview

  • code
  • thumb
  • cover
  • previews

There is 10 groups of genres(c1-c10) and each group has about 100 genres. Each genre has a unique id even they are in different groups. In the video table, I store the video genre id in the form like 68#10#185#237#89#340#156. Then I could get the genre id data back to array using explode "#". Then I could join the genre table with these id.

The filter system is the core function of the website. When people choose multiple genres, they could narrow down the results and precisely get what they want. I use GET method to pass the url request to the VideoController in Laravel like example.com/?c1=68,10&c2=185,237&c7=89,340,156

Here is the search steps:

  1. I put the /?c1=8&c2=33&c7=81 in to array $cArr [68,10,185,237,89,340,156]
  2. Then use the $cArr in the query with multiple LIKE operation
$data = cache()->remember($_SERVER['REQUEST_URI'], 60*60*24, function() use($cArr){
                    return DB::table('videos')
                            ->Where(function ($query) use($cArr) {
                                for ($i = 0; $i < count($cArr); $i++){
                                    $query->where('genre_id', 'like',  $cArr[$i] .'#%');
                                }})
                            ->orWhere(function ($query) use($cArr) {
                                for ($i = 0; $i < count($cArr); $i++){
                                    $query->where('genre_id', 'like', '%#' . $cArr[$i]);
                                }})
                            ->orWhere(function ($query) use($cArr) {
                                for ($i = 0; $i < count($cArr); $i++){
                                    $query->where('genre_id', 'like', '%#' . $cArr[$i] .'#%');
                                }})
                            ->leftjoin('videos_preview','videos_preview.code','=','videos.code')
                            ->orderBy('videos.publish_date', 'DESC')
                            ->limit(400)->get();

The will generate a slow query like the following. It took about 10s to run to search through 300K rows.

select * from `videos` left join `videos_preview` on `videos_preview`.`code` = `videos`.`code` 
where (`genre_id` like '68#%' and `genre_id` like '10#%' and `genre_id` like '185#%' and `genre_id` like '237#%' and `genre_id` like '89#%' and `genre_id` like '340#%' and `genre_id` like '156#%') 
or (`genre_id` like '%#68' and `genre_id` like '%#10' and `genre_id` like '%#185' and `genre_id` like '%#237' and `genre_id` like '%#89' and `genre_id` like '%#340' and `genre_id` like '%#156') 
or (`genre_id` like '%#68#%' and `genre_id` like '%#10#%' and `genre_id` like '%#185#%' and `genre_id` like '%#237#%' and `genre_id` like '%#89#%' and `genre_id` like '%#340#%' and `genre_id` like '%#156#%') order by `videos.publish_date` desc limit 400;

I have a 6GB Ram and 6CPU cores VPS. But as the traffic increase recently (about 500 visitors simultaneously) and the database grows 300+ row every day. I just found that the MySQL query eat up my CPU to 100%. As you can see I already cache the result for 24 hours but there are too many combinations for multiple genres. Most of the combination is first time show up within 24 hours which in not cached.

Please help me. Is there a better way to archive the same filter function in better way in Laravel 8? Thank you in advance for every life safer. And sorry for my bad English.

Philip
  • 63
  • 1
  • 7
  • 1
    You should look at normalizing the database, `genre_id` is probably the biggest part of your problem and with no validation of the ID's can get worse - https://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad – Nigel Ren May 11 '21 at 17:40
  • 1
    `like '%...'` is an absolute performance killer. As above, normalise your schema – Strawberry May 11 '21 at 17:48
  • Thank you for your advise. I decided to remake the database structure. I will abandon the comma-separated lists. Do you think using the many to many model will be much better? – Philip May 12 '21 at 02:20
  • I am trying to build some table combination like `videos` `genres` `genre_video`. But should I build tables for every `actor`, `publisher`, `series`... Seem like I would have to create 20+ tables – Philip May 12 '21 at 02:25
  • 1
    Yes, actors, publishers and other entities can be shared across many videos. You may be able to get away with have some of them combined into a single table with a type_id column which could have values of 'actor', 'publisher', 'series' etc. But it really depends on how much detail you need to capture for actor publisher and series, if you're only storing the name/title of them then you should be ok. – thursday_dan May 12 '21 at 08:03

1 Answers1

0
  • The ANDs and ORs are terribly messed up. Rethink.
  • LIKE '%...' has to check every row
  • OR has to check every row

This would have to check every row, but it would be faster:

WHERE FIND_IN_SET(genre, '68,10,185,237,89,340,156')

Note that commas are required for this. This would check that genre is one of those numbers. Do you want to AND together the user-supplied genre tests? Or OR them?

-- Both 185 and 10:
WHERE FIND_IN_SET(185, '68,10,185,237,89,340,156')
  AND FIND_IN_SET( 10, '68,10,185,237,89,340,156')

-- Both 185 or 10:
WHERE FIND_IN_SET(185, '68,10,185,237,89,340,156')
   OR FIND_IN_SET( 10, '68,10,185,237,89,340,156')

Another approach is to use FULLTEXT indexing:

-- The column `genre` might be "mystery drama documentary comedy"
WHERE MATCH(`genre`) AGAINST ("+comedy +musical" IN BOOLEAN MODE)

That would run a lot faster because it creates an inverse index of the words. The Against says that it must be both a comedy and a musical. (Hence that sample genre would not match.)

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Thank you for the detailed answer. I have tried the FIND_IN_SET method. The speed seems similar. I think the root cause is my LIKE operator. And my database is innodb so I may not able to use FULLTEXT. I will re design my database structure to avoid those LIKE. Thank you again for the help. – Philip May 12 '21 at 02:23
  • @Philip - FULLEXT was added to InnoDB in 5.6. What version are you using? – Rick James May 12 '21 at 03:31