Query parallelization for single connection in Postgres

Question

I am aware that multiple connections use multiple CPU cores in postgres and hence run in parallel.But when I execute a long running query say 30 seconds(Let's assume that this cannot be optimized further), the I/O is blocked and it does not run any other query from the same client/connection.

Is this by design or can it be improved ?

So I am assuming that the best way to run long running queries is to get a new connection or not to run any other query in the same connection until that query is complete ?

This is by design and cannot (currently) be changed. You will need open a second connection if you want to do work in parallel. And even if Postgres was able to use multiple cores in the backend for a single query - the connection that started that query would still be blocked. — , Sep 17 '15 at 12:15
I think this may answer your question: http://stackoverflow.com/questions/11620263/postgresql-multiple-transactions-on-the-same-connection — mustaccio, Sep 17 '15 at 12:15
That does not completely answer my question, but gives more insight, thanks for that. @a_horse_with_no_name : So is my assumption correct here, if it is a long running query, run it on a new connection if connections are cheap / don't run any queries in the same connection which requires quick turnaround ? — Greedy Coder, Sep 17 '15 at 12:18
Posted the comment after you edited.Makes sense now.Thanks :) — Greedy Coder, Sep 17 '15 at 12:20

score 7 · Accepted Answer · answered Sep 18 '15 at 04:30

It is a design limitation.

PostgreSQL uses one process per connection, and has one session per process. Each process is single-threaded and makes heavy use of globals inherited via fork() from the postmaster. Shared memory is managed explicitly.

This has some big advantages in ease of development, debugging and maintenance, and makes the system more robust in the face of errors. However, it makes it significantly harder to add parallelization on a query level.

There's ongoing work to add parallel query support, but at present the system is really limited to using one CPU core per query. It can benefit from parallel I/O in some areas, like bitmap index scans (via effective_io_concurrency), but not in others.

There are some IMO pretty hacky workarounds like PL/Proxy but mostly you have to deal with parallelization yourself client-side if it's needed. This is rapidly becoming one of the more significant limitations impacting PostgreSQL. Applications can split up large queries into multiple smaller queries that affect a subset of the data, then unify client-side (or into an unlogged table that then gets further processed), i.e. a map/reduce-style pattern. If a mix of big long running queries and low-latency OLTP queries is needed, multiple connections are required and the app should usually use an internal connection pool.

... And if you're going to implement manual parallelism, you may find that partitioning the major table(s) is helpful (all the usual caveats about partitioning still apply, of course). — David Aldridge, Sep 18 '15 at 06:53

Query parallelization for single connection in Postgres

1 Answers1

Linked