This could be much faster and simpler with a query using JOINs.
Something like this:
SELECT b.*, c.position, l.position
FROM books b
JOIN characternames c USING (bookid)
JOIN locations l USING (bookid)
WHERE c.name = 'XXX'
AND l.locname = 'YYY';
More info after comment
"Thousands of books" are no problem at all for a RDBMS like PostgreSQL that is designed to handle millions. The key to performance with large tables are proper indexes. For the queries here the following indexes will potentially help:
CREATE INDEX books_bookid_idx ON books(bookid); -- a primary key will do, too
CREATE INDEX cn_bookid_idx ON characternames (bookid);
CREATE INDEX cn_name_idx ON characternames (name);
CREATE INDEX locations_bookid_idx ON locations (bookid);
CREATE INDEX locations_locname_idx ON locations (locname);
Multicolumn indexes may perform even better. Test with EXPLAIN ANALYZE
, it will show you which indexes get used and how fast the query is. Creating indexes is very fast, experimenting with them is easy. Just don't keep indexes you don't need. They carry a maintenance cost, too.
Optimized query
I think
I understand now, what you are looking for. This query should be optimized to get all positions of locations or names per bookid
, but only where name and location show up in the same book, and no further details per book:
WITH b AS (
SELECT bookid
FROM characternames
WHERE name = 'XXX'
GROUP BY 1
INTERSECT
SELECT bookid
FROM locations
WHERE l.locname = 'YYY'
GROUP BY 1
)
SELECT bookid, position, 'char' AS what
FROM b
JOIN characternames USING (bookid)
WHERE name = 'XXX'
UNION ALL
SELECT bookid, position, 'loc' AS what
FROM b
JOIN locations USING (bookid)
WHERE locname = 'YYY'
ORDER BY bookid, position;
Major points
- The CTE (
WITH
query) makes sure the base query is only executed once.
INTERSECT
picks only bookids
that feature both location and name.
- The
UNION ALL
in the final SELECT
returns all found positions. Use UNION
instead if you want to trim duplicates with the same position.
- I order by
bookid, position
- guessing that is what's needed.
- Added a column
what
to tag the source (location or name) of a position.
Further optimization
If search terms appear many times per book you could considerably speed up the search by creating auxiliary tables with distinct entries for (bookid, term)
. Create a multicolumn primary index on the two columns and an additional one on just term
. Create one such table for locations and another one for names. Keep them up to date with triggers if need should be, but I assume the content of books is not changing much. Would simplify and speed up the CTE.
If that still isn't fast enough, look into Full Text Search.