1

I want to find for each genre of movie, find the N actors who have played in most movies of the genre

Tables and their columns:

actor(actor_id,name)
role(actor_id,movie_id)
movie(movie_id,title)
movie_has_genre(movie_id,genre_id)
genre(genre_id,genre_name)

With this query I can find the actors who played on the most movies of the same genre.

select t1.genre_name, t1.actor_id, t1.max_value
from
(
    select g.genre_name, a.actor_id, count(*) as max_value
    from genre g
    inner join movie_has_genre mhg on mhg.genre_id = g.genre_id
    inner join movie m on mhg.movie_id = m.movie_id
    inner join role r on m.movie_id = r.movie_id
    inner join actor a on a.actor_id = r.actor_id
    group by g.genre_name, a.actor_id
) t1
inner join
(
    select genre_name, MAX(max_value) AS max_value
    from
    (
        select g.genre_name, a.actor_id, count(*) as max_value
        from genre g
        inner join movie_has_genre mhg on mhg.genre_id = g.genre_id
        inner join movie m on mhg.movie_id = m.movie_id
        inner join role r on m.movie_id = r.movie_id
        inner join actor a on a.actor_id = r.actor_id
        group by g.genre_name, a.actor_id
    ) t
    GROUP BY genre_name
) t2
    ON t1.genre_name = t2.genre_name and t1.max_value = t2.max_value
ORDER BY
    t1.max_value desc;

But I want to limit the number of the actors to 1.So how can I do that?

Example:

Results I get:

genre_name | actor_id | max_value
==================================
Thriller   |  22591   |   7
Drama      |  22591   |   6
Crime      |  65536   |   3
Horror     |  22591   |   3
Action     |  292028  |   3
Action     |  378578  |   3
Action     |  388698  |   3

Results I want:

 genre_name | actor_id | max_value
==================================
Thriller    | 22591    | 7
Drama       | 22591    | 6
Crime       | 65536    | 3
Horror      | 22591    | 3
Action      | 292028   | 3
Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
stef
  • 121
  • 10
  • 1
    You need to define the logic that selects 1 actor if you have a tie. That's a business logic decision, not a programming one. Once you made up your mind, it will be a lot easier to implement it. – Shadow Jun 01 '18 at 12:21
  • Ok lets say I want to limit it by the actor_id being the minimum.(pretty much I pick the guy with 933 actor_id over the guy with 2591 actor_id).My problem is where should I implement it and how cause I am searching for hours and I couldnt find something. – stef Jun 01 '18 at 12:27
  • there is no downvote on this question at all, so no idea what you are talking about. I did not downvote the other question you asked earlier either. Just closed it as being a duplicate. The two are not the same. – Shadow Jun 01 '18 at 14:55

4 Answers4

0

If you want just one actor selected randomly just add the following line to your code:

 select genre_name, actor_id, max_value
 from
 (
     select g.genre_name, a.actor_id, count(*) as max_value
     from genre g
     inner join movie_has_genre mhg on mhg.genre_id = g.genre_id
     inner join movie m on mhg.movie_id = m.movie_id
     inner join role r on m.movie_id = r.movie_id
     inner join actor a on a.actor_id = r.actor_id
     group by g.genre_name, a.actor_id
 ) t1
 inner join
 (
     select genre_name, MAX(max_value) AS max_value
     from
     (
         select g.genre_name, a.actor_id, count(*) as max_value
         from genre g
         inner join movie_has_genre mhg on mhg.genre_id = g.genre_id
         inner join movie m on mhg.movie_id = m.movie_id
         inner join role r on m.movie_id = r.movie_id
         inner join actor a on a.actor_id = r.actor_id
         group by g.genre_name, a.actor_id
     ) t
     GROUP BY genre_name
 ) t2
     USING(genre_name,max_value)
 GROUP BY genre_name, max_value
 ORDER BY max_value desc;
A. Colonna
  • 852
  • 7
  • 10
  • Tried it above order by and I had an error with group by – stef Jun 01 '18 at 12:34
  • Error Code: 1055. Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 't1.actor_id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by 0.031 sec – stef Jun 01 '18 at 12:55
  • I changed it too to this – stef Jun 01 '18 at 12:56
  • and I had the same problem – stef Jun 01 '18 at 12:56
  • Error Code: 1055. Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 't1.actor_id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by 0.031 sec – stef Jun 01 '18 at 13:04
  • You need a min() on the actor id in the outermost query, if you want to get the minimum – Shadow Jun 01 '18 at 15:01
0

Some of the joins you used are redundant.

SELECT 
  U.genre_name, U.actor_id, U.actor_genre_count
FROM
    (SELECT 
      A.genre_id, A.genre_name, C.actor_id, count(*) actor_genre_count
    FROM genre A 
    JOIN movie_has_genre B
    ON A.genre_id=B.genre_id 
    JOIN role C 
    ON C.movie_id=B.movie_id
    GROUP BY A.genre_id, A.genre_name, C.actor_id) U
JOIN 
   (SELECT 
       S.genre_id, S.genre_name, MAX(S.actor_genre_count) max_actor_genre
    FROM
       (SELECT 
           A.genre_id, A.genre_name, C.actor_id, count(*) actor_genre_count
        FROM genre A
        JOIN movie_has_genre B
        ON A.genre_id=B.genre_id 
        JOIN role C 
        ON C.movie_id=B.movie_id
        GROUP BY A.genre_id, A.genre_name, C.actor_id) S
    GROUP BY S.genre_id, S.genre_name) V
ON U.genre_name=V.genre_name AND U.actor_genre_count=V.max_actor_genre;
cdaiga
  • 4,861
  • 3
  • 22
  • 42
0

This solution is adapted from this Stack Overflow answer about limiting results by name. I attempted to do a similar query that should choose the first actor_id and only return it.

SELECT id, CategoryName, image, date_listed, item_id
SELECT t1.genre_name, t1.actor_id, t1.actor_movie_count
FROM
    (
    SELECT g.genre_name, r.actor_id, COUNT(*) as actor_movie_count    
    FROM genre g
    INNER JOIN movie_has_genre mhg ON mhg.genre_id = g.genre_id
    INNER JOIN role r ON m.movie_id = r.movie_id
    GROUP BY g.genre_name, r.actor_id
    ) t1
LEFT JOIN
    (
    SELECT genre_name, actor_id, MAX(actor_movie_count) AS max_actor_movie_count
    FROM
         (
         SELECT g.genre_name, r.actor_id, COUNT(*) AS actor_movie_count
         FROM genre g
         INNER JOIN movie_has_genre mhg ON mhg.genre_id = g.genre_id
         INNER JOIN role r ON m.movie_id = r.movie_id
         GROUP BY g.genre_name, r.actor_id
         )
    GROUP BY genre_name
    ) t2
ON t1.genre_name = t2.genre_name AND t1.actor_movie_count = t2.max_actor_movie_count AND (t1.actor_id > t2.actor_id)
WHERE t2.genre_id IS NULL
ORDER BY t1.actor_movie_count DESC

If this still doesn't solve your problem, other similar questions with explanations are described below:

SO answer about returning 1 row per group

SO question about limiting query answer to N results per group

SO question about selecting N items per category

External Article: Finding the max/first of a particular group in SQL

matt123788
  • 419
  • 5
  • 16
  • I have tried it but I cant make it work.Also distinct doesnt work cause every result I take is unique – stef Jun 01 '18 at 14:14
0

You can use a correlated LIMIT 1 subquery to get the id of the actor who played that genre most.

select g.genre_name, (
        select r.actor_id
        from movie_has_genre mg
        join role r on r.movie_id = mg.movie_id
        where mg.genre_id = g.genre_id
        group by r.actor_id
        order by count(*) desc,
                 r.actor_id asc -- on tie least actor_id wins
    ) as actor_id
from genre g

The result would be like:

genre_name | actor_id
======================
Thriller   | 22591   
Drama      | 22591   
Crime      | 65536   
Horror     | 22591   
Action     | 292028 

As you see, the count is not included. If you need the count, the simple way would be to return it in the same string column with actor_id

Change the SELECT clause in the subquery to

select concat(r.actor_id, ':', count(*)) as actor_id_count

This wil return the actor_id and the count in a single string column like

genre_name | actor_id_count
===========================
Thriller   |  22591:7

You can then parse it (with split, explode or what ever) in your application code.

A solution with CTE (Common Table Expression) and ROW_NUMBER() (window functions) (supported by MySQL 8 and MariaDB 10.2) could be:

with cte as (
    select g.genre_name, r.actor_id, count(*) as max_value,
       row_number() over (partition by g.genre_name order by count(*) desc, r.actor_id) as rn
    from genre g
    inner join movie_has_genre mhg on mhg.genre_id = g.genre_id
    inner join role r on mhg.movie_id = r.movie_id
    group by g.genre_name, r.actor_id
)
select genre_name, actor_id, max_value from cte where rn = 1
Paul Spiegel
  • 30,925
  • 5
  • 44
  • 53