2

I have a table called ratings with the following fields:

+-----------+------------+------+-----+---------+----------------+
| Field     | Type       | Null | Key | Default | Extra          |
+-----------+------------+------+-----+---------+----------------+
| rating_id | bigint(20) | NO   | PRI | NULL    | auto_increment |
| user_id   | int(11)    | NO   | MUL | NULL    |                |
| movie_id  | int(11)    | NO   |     | NULL    |                |
| rating    | float      | NO   |     | NULL    |                |
+-----------+------------+------+-----+---------+----------------+

Indexes on this table:

+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table   | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| ratings |          0 | PRIMARY  |            1 | rating_id   | A         |      100076 |     NULL | NULL   |      | BTREE      |         |               |
| ratings |          0 | user_id  |            1 | user_id     | A         |         564 |     NULL | NULL   |      | BTREE      |         |               |
| ratings |          0 | user_id  |            2 | movie_id    | A         |      100092 |     NULL | NULL   |      | BTREE      |         |               |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

I have another table called movie_average_ratings which has the following fields:

+----------------+---------+------+-----+---------+-------+
| Field          | Type    | Null | Key | Default | Extra |
+----------------+---------+------+-----+---------+-------+
| movie_id       | int(11) | NO   | PRI | NULL    |       |
| average_rating | float   | NO   |     | NULL    |       |
+----------------+---------+------+-----+---------+-------+

As it is obvious by this point I want to calculate the average rating of movies from ratings table and update the movie_average_ratingstable. I tried the following SQL query.

UPDATE movie_average_ratings
SET average_rating = (SELECT AVG(rating)
                            FROM ratings
                            WHERE ratings.movie_id = movie_average_ratings.movie_id);

Currently, there are around 10,000 movie records and 100,000 rating records and I get Lock wait timeout exceeded; try restarting transaction error. The number of records can grow significantly so I don't think increase timeout is a good solution.

So, how can I write 'scalable' query to acheive this? Is iterating the movie_average_ratings table records and calculate averages individually the most efficient solution to this?

Dilip Raj Baral
  • 3,060
  • 6
  • 34
  • 62

3 Answers3

1

Without an explain, it's hard to be clear on what's holding you up. It's also not clear that you will get a performance improvement by storing this aggregated data as a denormalized table - if the query to calculate the ratings executes in 0.04 seconds, it's unlikely querying your denormalized table will be much faster.

In general, I recommend only denormalizing if you know you have a performance problem.

But that's not the question.

I would do the following:

delete from movie_average_ratings;

insert into movie_average_ratings
Select movie_ID, avg(rating) 
from ratings 
group by movie_id;
Neville Kuyt
  • 29,247
  • 1
  • 37
  • 52
0

I just found something in another post:

What is happening is, some other thread is holding a record lock on some record (you're updating every record in the table!) for too long, and your thread is being timed out.

This means that some of your records are locked you can force unlock them in the console:

1) Enter MySQL mysql -u your_user -p

2) Let's see the list of locked tables mysql> show open tables where in_use>0;

3) Let's see the list of the current processes, one of them is locking your table(s) mysql> show processlist;

4) Kill one of these processes mysql> kill put_process_id_here;

Community
  • 1
  • 1
WasteD
  • 758
  • 4
  • 24
  • I know the table is put on lock as long as the update operation is running. My question is what would be a better way to update the table? If the table has records in order of 100,000 or a million, it will take way too long. – Dilip Raj Baral Mar 06 '17 at 10:12
0

You could redesign the movie_average_ratings table to

movie_id (int)
sum_of_ratings (int)
num_of_ratings (int)

Then, if a new rating is added you can add it to movie_average_ratings and calculate the average if needed

  • To calculate the sum I will have to again use `SUM` function. Brings us to square one. After setting up average table I do plan to use "moving average" method to update ratings. But the problem is initial setup of the table. – Dilip Raj Baral Mar 06 '17 at 10:23
  • You don't store individual ratings in movie_average_ratings table! If you get a new rating just add it to sum_of_ratings and increment num_of_ratings – Christopher Gallé Mar 06 '17 at 12:53