I have tables called Movie, Genre and Keyword, from which I have created a view called 'genkeyword'. The view 'genkeyword' has a lot of tuples, so it can be accessed at DB Fiddle.
The following query calculates the frequency of genres that the movies in genkeyword have in common with Harry Potter and the frequency of keywords that the movies in genkeyword have in common with Harry Potter. It then combines the results of the two queries using a full outer join.
SELECT
*
FROM
(
SELECT
title,
year,
count(distinct genre) as genre_freq
FROM
genkeyword
where
(
genre in (
select
genre
from
genkeyword
where
title = 'Harry Potter and the
Deathly Hallows'
)
)
AND title <> 'Harry Potter and the Deathly Hallows'
group by
title,
year
) a
FULL OUTER JOIN (
select
title,
year,
count(distinct keyword) as keyword_freq
FROM
genkeyword
where
keyword in (
select
keyword
from
genkeyword
where
title = 'Harry Potter and
the Deathly Hallows'
)
and title <> 'Harry Potter and the Deathly Hallows'
group by
title,
year
) b ON b.title = a.title;
The output of the above query is the following:
title | year | genre_freq | title | year | keyword_freq
Cinderella 2015 2 Cinderella 2015 2
Enchanted 2007 1 Enchanted 2007 3
How to train your dragon 2010 2 null null null
The Shape of Water 2017 2 The Shape of Water 2017 1
I know that for two result sets A and B, a full outer join outputs the rows that match in A and B, as well as the rows in A that don't have matching rows in B and the rows in B that don't have matching rows in A.
However, in the third row of the output (where the movie is 'How to train your dragon'), why are there null values for the title and year attribute? I know that the for the keyword_freq the value would be null because the movie does not have any keywords in common with Harry Potter, but wouldn't the title and year attributes have the values 'How to train your dragon' and 2010 respectively?
Any insights are appreciated.