I have a database with
books (primary key: bookID)
characterNames (foreign key: books.bookID)
locations (foreign key: books.bookID)
The in-text-position of character names and locations are saved in the corresponding tables.
I'm writing a Pythonscript using psycopg2, finding all occurences of given character names and locations in books. I only want the occurences in books, where both the character name AND the location are found.
Here I already got a solution for searching one location and one character:
WITH b AS (
SELECT bookid
FROM characternames
WHERE name = 'XXX'
GROUP BY 1
INTERSECT
SELECT bookid
FROM locations
WHERE l.locname = 'YYY'
GROUP BY 1
)
SELECT bookid, position, 'char' AS what
FROM b
JOIN characternames USING (bookid)
WHERE name = 'XXX'
UNION ALL
SELECT bookid, position, 'loc' AS what
FROM b
JOIN locations USING (bookid)
WHERE locname = 'YYY'
ORDER BY bookid, position;
The CTE 'b' contains all bookid's, where the character name 'XXX' and the location 'YYY' appear.
Now I'm additionally wondering about searching for 2 places and a name (or 2 names and a place respectively). It's simple if all searched entities must occur in one book, but what about this:
Searching for: Tim, Al, Toolshop
Results: books including
(Tim, Al, Toolshop) or
(Tim, Al) or
(Tim, Toolshop) or
(Al, Toolshop)
The problem could be repeated for 4, 5, 6...conditions.
I thougt about INTERSECTing more subqueries, but that wouldn't work.
Instead I would UNION the found bookIDs, GROUP them and select bookid's occurring more then once:
WITH b AS (
SELECT bookid, count(bookid) AS occurrences
FROM
(SELECT DISTINCT bookid
FROM characterNames
WHERE name='XXX'
UNION
SELECT DISTINCT bookid
FROM characterNames
WHERE name='YYY'
UNION
SELECT DISTINCT bookid
FROM locations
WHERE locname='ZZZ'
GROUP BY bookid)
WHERE occurrences>1)
I think this works, can't test it at the moment, but is it the best way to do this?