3

Looking here, it is clear that Oracle supports execution of DDL commands in parallel with scenarios clearly listed. I was wondering whether Postgres does indeed offer such functionality? I can find a lot of material on "parallel queries" for PostgreSQL but not so much when DDL is involved.

For example, can I execute multiple 'CREATE TABLE...AS SELECT' in parallel? And if not, how can I achieve such functionality? What happens if I have a temporary table (CREATE TEMP TABLE)? Do I need to configure something for locks?

Zeruno
  • 1,391
  • 2
  • 20
  • 39
  • Those are index rebuilds, not just any type of DDL. They can be paralllelized. `CREATE TABLE` doesn't need to be parallelized for example, it's a metadata operation. `CREATE .. AS SELECT` is half D*M*L, and that query can be parallelized. – Panagiotis Kanavos Feb 21 '20 at 17:30
  • "*can I execute multiple 'CREATE TABLE...AS SELECT' in parallel*" - yes of course (as long as they don't create the same table). –  Feb 21 '20 at 17:35
  • 1
    `Parallel` and `Concurrent` are two completely different things. `Parallel` means using multiple CPUs/threads to handle lots of data in a single operation. `Concurrent` means doing many operations at the same time. – Panagiotis Kanavos Feb 21 '20 at 17:42
  • @PanagiotisKanavos Why are you highlighting the difference between concurrency and parallelism? – Zeruno Feb 21 '20 at 17:47
  • @Zeruno because it matters a *LOT*. If you want to make index rebuilding or creating a table from results go faster, you're looking for parallelization, not concurrency. You *can't* make an INSERT go faster by manually running 4 or 8 INSERT operations for example, they'll end up blocking each other. – Panagiotis Kanavos Feb 21 '20 at 17:48
  • @a_horse_with_no_name you answered that `CREATE TABLE AS` would be parallelized in PostgreSQL 11 in a comment [here](https://stackoverflow.com/questions/50702049/creating-table-or-insert-into-with-parallel-workers) – Panagiotis Kanavos Feb 21 '20 at 17:53
  • @PanagiotisKanavos: the question "*can a single CREATE TABLE make use of parallel query*" is something completely different then "*can I execute multiple CREATE TABLE AS in parallel*" –  Feb 21 '20 at 18:04
  • @Zeruno: it's unclear (at least to me) what kind of "parallel" you are talking about. Running multiple DDL statements concurrently from different connections. Or are you referring to a **single** DDL statement that parallelizes it's work. –  Feb 21 '20 at 18:06
  • @a_horse_with_no_name that's why I made the distinction between parallel and concurrent. The OP's link talks about parallelizing `CREATE TABLE ... AS SELECT` too, not concurrent executions. – Panagiotis Kanavos Feb 21 '20 at 18:08
  • @a_horse_with_no_name both are of great interest to me, I am keeping my question generic. I can see that there is a lot of material on the former but at this point, the latter (multiple CREATE TABLE AS in parallel) is not so clear to me. Yes, concurrency is possible but that is not what I am exactly interested in. Can I expect a speedup? – Zeruno Feb 21 '20 at 18:10
  • @Zeruno those are *concurrent* statements, not parallel. They aren't interchangeable. If you mix up the terms you'll never understand what the docs say. *All* operations in the database can run concurrently except some of those that modify the database itself. That's why you can have multiple connections executing the same command, as long as it doesn't cause conflicts (like creating objects with the same name). You get a speedup because you *don't* have to wait for those commands to execute sequentially. You won't get linear speedup because all those connections use the same IO, CPU, RAM – Panagiotis Kanavos Feb 26 '20 at 08:49
  • @Zeruno this sounds like [an XY Problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) though. You have a problem X, assume Y is the solution so when Y fails, you ask about Y, not the real problem X. What is the *real* problem here? Do you have a problem with data movement and assumed that concurrent `CREATE ..AS` calls would solve it? – Panagiotis Kanavos Feb 26 '20 at 08:51
  • @PanagiotisKanavos not sure what made you think I am using the statements interchangeably? We know that concurrency is possible. And that the possibility of speedups is created under * specific * situations. I am interested in exploring the latter beyond general comments. Regarding the XY problem comment, as you can see in the comments there, it can sometimes be useful. I can't reveal too much information without this discussion going offtopic even further and making the problem larger than it is. – Zeruno Feb 26 '20 at 13:54

2 Answers2

1

From here:

Even when it is in general possible for parallel query plans to be generated, the planner will not generate them for a given query if any of the following are true:

  • The query writes any data or locks any database rows. If a query contains a data-modifying operation either at the top level or within a CTE, no parallel plans for that query will be generated.

(emphasis mine).

Which seems to suggest that Postgres will not "parallelize" any query that modifies the database structure, under any circumstances.

Running multiple queries simultaneously in Postgres requires one connection per running query.

Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
  • That's only for DMLs. The operations in the OP's question are index operations that *can* be paralllelised. Check for `Parallel Index Build` in the `Notes` section of [CREATE INDEX](https://www.postgresql.org/docs/11/sql-createindex.html) – Panagiotis Kanavos Feb 21 '20 at 17:28
  • @PanagiotisKanavos: Sounds like the beginnings of a good answer. – Robert Harvey Feb 21 '20 at 17:30
1

Those are generic DDL statements, they are index operations and partition operations that can be parallelized.

If you check the Notes section of the CREATE INDEX statement, you'll see that parallel index building is supported :

PostgreSQL can build indexes while leveraging multiple CPUs in order to process the table rows faster. This feature is known as parallel index build. For index methods that support building indexes in parallel (currently, only B-tree), maintenance_work_mem specifies the maximum amount of memory that can be used by each index build operation as a whole, regardless of how many worker processes were started. Generally, a cost model automatically determines how many worker processes should be requested, if any.

Update

I suspect the real question is about CREATE TABLE ... AS though.

This is essentially a CREATE TABLE followed by an INSERT .. SELECT. The CREATE TABLE part can't be parallelized and doesn't have to - it's essentially a metadata operation. The SELECT on the other hand, could be parallelized easily. INSERT is a bit harder, but it's a matter of implementation.

As a_horse_with_no_name explains in a comment to this question, parallelization for CREATE TABLE AS was added in PostgreSQL 11 :

Improvements to parallelism, including:

  • CREATE INDEX can now use parallel processing while building a B-tree index
  • Parallelization is now possible in CREATE TABLE ... AS, CREATE MATERIALIZED VIEW, and certain queries using UNION
  • Parallelized hash joins and parallelized sequential scans now perform better
Community
  • 1
  • 1
Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236