Inspired by this StackOverflow question:
Find mutual element in different facts in swi-prolog
We have the following
Problem statement
Given a database of "actors starring in movies" (starsin is the relation linking actor "bob" to movie "a" for example)
starsin(a,bob). starsin(c,bob). starsin(a,maria). starsin(b,maria). starsin(c,maria). starsin(a,george). starsin(b,george). starsin(c,george). starsin(d,george).
And given set of movies M, find those actors that starred in all the movies of M.
The question was initially for Prolog.
Prolog solution
In Prolog, an elegant solution involves the predicate
setof/3
,
which collects possible variable instantiations into a set (which is really list without
duplicate values):
actors_appearing_in_movies(MovIn,ActOut) :-
setof(
Ax,
MovAx^(setof(Mx,starsin(Mx,Ax),MovAx), subset(MovIn,MovAx)),
ActOut
).
I won't go into details about this, but let's look at the test code, which is of interest here. Here are five test cases:
actors_appearing_in_movies([],ActOut),permutation([bob, george, maria],ActOut),!.
actors_appearing_in_movies([a],ActOut),permutation([bob, george, maria],ActOut),!.
actors_appearing_in_movies([a,b],ActOut),permutation([george, maria],ActOut),!.
actors_appearing_in_movies([a,b,c],ActOut),permutation([george, maria],ActOut),!.
actors_appearing_in_movies([a,b,c,d],ActOut),permutation([george],ActOut),!.
A test is a call to the predicate actors_appearing_in_movies/2
, which is given
the input list of movies (e.g. [a,b]
) and which captures the resulting list of
actors in ActOut
.
Subsequently, we just need to test whether ActOut
is a permutation of the expected
set of actors, hence for example:
permutation([george, maria],ActOut)`
"Is ActOut
a list that is a permutation of the list [george,maria]
?.
If that call succeeds (think, doesn't return with false
), the test passes.
The terminal !
is the cut operator and is used to tell the Prolog engine to not
reattempt to find more solutions, because we are good at that point.
Note that for the empty set of movies, we get all the actors. This is arguably correct: every actors stars in all the movies of the empty set (Vacuous Truth).
Now in SQL.
This problem is squarely in the domain of relational algebra, and there is SQL, so let's have a go at this. Here, i'm using MySQL.
First, set up the facts.
DROP TABLE IF EXISTS starsin;
CREATE TABLE starsin (movie CHAR(20) NOT NULL, actor CHAR(20) NOT NULL);
INSERT INTO starsin VALUES
( "a" , "bob" ),
( "c" , "bob" ),
( "a" , "maria" ),
( "b" , "maria" ),
( "c" , "maria" ),
( "a" , "george" ),
( "b" , "george" ),
( "c" , "george" ),
( "d", "george" );
Regarding the set of movies given as input, giving them in the form of a (temporary) table sounds natural. In MySQL, "temporary tables" are local to the session. Good.
DROP TABLE IF EXISTS movies_in;
CREATE TEMPORARY TABLE movies_in (movie CHAR(20) NOT NULL);
INSERT INTO movies_in VALUES ("a"), ("b");
Approach:
The results can now be obtained by getting, for each actor, the intersection of the set of
movies denoted by movies_in
and the set of movies in which an actor ever appeared
(created for each actor via the inner join), then counting (for each actor) whether the
resulting set has at least as many entries as the set movies_in
.
Wrap the query into a procedure for practical reasons. A delimiter is useful here:
DELIMITER $$
DROP PROCEDURE IF EXISTS actors_appearing_in_movies;
CREATE PROCEDURE actors_appearing_in_movies()
BEGIN
SELECT
d.actor
FROM
starsin d, movies_in q
WHERE
d.movie = q.movie
GROUP BY
actor
HAVING
COUNT(*) >= (SELECT COUNT(*) FROM movies_in);
END$$
DELIMITER ;
Run it!
Problem A appears:
Is there a better way than edit + copy-paste table creation code,
issue a CALL
and check the results "by hand"?
DROP TABLE IF EXISTS movies_in;
CREATE TEMPORARY TABLE movies_in (movie CHAR(20) NOT NULL);
CALL actors_appearing_in_movies();
Empty set!
Problem B appears:
The above is not desired, I want "all actors", same as for the Prolog solution. As I do not want to tack a weird edge case exception onto the code, my approach must be wrong. Is there one which naturally covers this case but doesn't become too complex? T-SQL and PostgreSQL one-liners are fine too!
The other test cases yield expected data:
DROP TABLE IF EXISTS movies_in;
CREATE TEMPORARY TABLE movies_in (movie CHAR(20) NOT NULL);
INSERT INTO movies_in VALUES ("a"), ("b");
CALL actors_appearing_in_movies();
+--------+
| actor |
+--------+
| george |
| maria |
+--------+
DROP TABLE IF EXISTS movies_in;
CREATE TEMPORARY TABLE movies_in (movie CHAR(20) NOT NULL);
INSERT INTO movies_in VALUES ("a"), ("b"), ("c");
CALL actors_appearing_in_movies();
+--------+
| actor |
+--------+
| george |
| maria |
+--------+
DROP TABLE IF EXISTS movies_in;
CREATE TEMPORARY TABLE movies_in (movie CHAR(20) NOT NULL);
INSERT INTO movies_in VALUES ("a"), ("b"), ("c"), ("d");
CALL actors_appearing_in_movies();
+--------+
| actor |
+--------+
| george |
+--------+