I have tables called Movie, Genre and Keyword, from which I have created a view called 'genkeyword'. The view 'genkeyword' has a lot of tuples, so it can be accessed at DB Fiddle.
I have the following query:
SELECT title,
year,
Count(DISTINCT genre) AS genre_freq,
Count(DISTINCT keyword) AS keyword_freq
FROM genkeyword
WHERE ( genre IN (SELECT genre
FROM genkeyword
WHERE title = 'Harry Potter and the Deathly Hallows')
OR keyword IN (SELECT keyword
FROM genkeyword
WHERE title = 'Harry Potter and the Deathly Hallows') )
AND title <> 'Harry Potter and the Deathly Hallows'
GROUP BY title,
year
ORDER BY genre_freq DESC,
keyword_freq DESC;
What I am intending to do with this query is to get the genre and keyword frequency for each movie that has genres and keywords that are in common with Harry Potter: The output should be:
title | genre_freq | keyword_freq
Cinderella 2 2
The Shape of Water 2 1
How to Train Your Dragon 2 0
Enchanted 1 3
I know that the query is not correct, since the I get the following output instead:
title | genre_freq | keyword_freq
The Shape of Water 4 3
Enchanted 3 4
Cinderella 2 5
How to Train Your Dragon 2 3
However, I would like to clarify my understanding about how the query works.
In the 'where' clause of my query:
where (genre in (select genre from genkeyword where title='Harry Potter') or
keyword in (select keyword from genkeyword where title='Harry Potter'))
Am I right in saying that there are two result sets generated, one containing all the tuples which have a genre that is in Harry Potter (let this be R1) and the other containing all the tuples that have a keyword that is in Harry Potter (let this be R2)?
If the tuple under consideration contains a genre that is in the genre result set R1 or a keyword that is in the keyword result set R2, then the genre/keyword is counted. I am not sure how count(distinct genre) and count(distinct keyword) works in this case. If the tuple contains a genre that is in R1, is only the genre counted or is the keyword counted as well? This is the same for the case when the tuple contains a keyword in R2, is the genre counted as well as the keyword?
I don't understand why I am getting the genre_freq and keyword_freq values wrong from my query. This is because I don't fully understand how the genre and keyword frequencies are getting counted behind-the-scenes. Any insights are appreciated.