PostgreSQL optimization: running select queries on preselected subset

Question

I need to run multiple (lots of!) selects on a table (obviously simplified):

libraries_books
---------------
id
library_id
book_id

where I'll be looking for different book_id's on the same library_id.

Now, I'm aware of temporary tables:

SELECT id, book_id INTO TEMPORARY tmp_books where library_id=?

and then one can optionally add indexes on tmp_books and run queries on it instead of libraries_books, but I've got a feeling there's an another way to achieve this in a more performant fashion. Is there?

How much extra performance do you actually need beyond a single index on `(library_id, book_id)`? — MatBailie, Sep 17 '13 at 21:11
What are those "multiple selects" on that table? Are you sure you cannot combine that into a single statement? Using temporary tables for something like that sounds a bit strange. — , Sep 17 '13 at 21:14
I edited the question with the question :) I also removed surplus information about the upsert, but thanks for that... The table is going to become really large and I am going to do tens of thousands of `select`'s on it at a time (complicated import script),- *and* some of the queries are based on other columns too (besides `book_id`), so performance **is** an issue. Also, it's a type of a general problem I often come by, and just thought there'd be a solution I'm not aware of. — raveren, Sep 17 '13 at 21:18
you may see some performance improvements by partitioning the table on Library_ID. — Declan_K, Sep 17 '13 at 21:23
@a_horse_with_no_name: just moments ago I removed the explanation for the need of multiple selects - I'm doing an upsert with postgres 8.4 and as I see it, there's no non-hacky way to do that: http://stackoverflow.com/q/1109061/179104 — raveren, Sep 17 '13 at 21:23
Postgres 8.4 is going to be de-supported mid of next year so you should plan your upgrade soon anyway. Maybe a good reason to upgrade earlier and make use of writeable CTEs — , Sep 17 '13 at 21:26
I'll definitely bitch my boss about it, but it's not possible at this time. I guess I've ended up with profiling simple selects vs ones on temporary tables. — raveren, Sep 17 '13 at 21:31

score 0 · Answer 1 · answered Nov 16 '13 at 10:54

0

Premature optimization is the root of all evil. In PostgreSQL you won't go far wrong by:

Normalizing your data
Applying indexes to the few tables required
Tweaking indexes as needed

In general PostgreSQL has tons of tricks to try to make your use case fast. Not only do you have caching in the db, but you also have an ability for selects against the same table to work in parallel if they both have to do a sequential scan (meaning the second query gets the benefits of piggy backing on the first query's scan).

If you end up with tons of records, consider partial indexes or other strategies.

answered Nov 16 '13 at 10:54

Chris Travers

25,424
6
65
182

+1, but I do hate that phrase, "premature optimization is the root of all evil". – David Aldridge Nov 16 '13 at 11:08
was this answer meant for another question? Because it has nothing to do with mine. – raveren Nov 19 '13 at 11:08
Raveren, it isn't clear what your question is, so I gave a very general assessment. Feel free to edit and maybe you will get a better answer. – Chris Travers Nov 19 '13 at 14:14

PostgreSQL optimization: running select queries on preselected subset

1 Answers1