4

I always thought that CTEs should be considered as an inline view macro. So my thinking is: if the CTE is not referenced/used, it is not executed. It is just a definition, nothing more.

But, take the following query:

create table t
(
    id int primary key
);

with
a as
(
    insert into t(id) values(1)
)
select false;

select * from t;

It seems that after the CTE-based query, select * from t returns the tuple as inserted in the CTE. Why is this tuple inserted, despite the fact that the CTE is not used?

Is this by design or specification? Is it safe to rely on this behavior? This allows to execute multiple queries which are totally uncorrelated in one single query.

This seems to contradict the following information: https://blog.2ndquadrant.com/postgresql-ctes-are-optimization-fences/#comment-19121

Katrine Bers
  • 115
  • 7

2 Answers2

4
  • syntactically a CTE behaves like any other table expression.

  • semantically it is different. [in Postgres] it will always be executed once , even if it is referenced more than once.

  • [in Postgres] a CTE will act as an optimisation barrier; query terms cannot be moved between (into or outof) the CTE and the main query.


The second and third point can have serious implications. Because of the barrier&exactly once, a CTE-scan can hardly make use of implicit order or the presence of indexes inside the CTE. A CTE scan more or less behaves like a sequential scan on an unordered table or materialised view. For small CTEs this will be no problem, since a hash-join can be used. Large CTE's will need materialising+sorting to join the CTE to the main query.

wildplasser
  • 43,142
  • 8
  • 66
  • 109
0

In postgres, a cte should not be considered an inline view, although thinking of it as a materialized view that lives within a scope of a statement is useful. A CTE will be materialized if it is referenced in another part of the query, or if it is alters data (INSERT / UPDATE / DELETE).

So, since your example alters data, the CTE is evaluated, whereas the link you refer to has a CTE that doesn't alter data.

Whereas in other databases, predicates from the outer query will be pushed down to the CTE by the optimizer, in postgresql, the CTE will be fully materialized.

e.g.

WITH cte AS (SELECT * FROM foo WHERE foo.bar = True)
SELECT * FROM cte WHERE cte.id > 10 AND cte.id < 20

is slower than in postgresql

SELECT * FROM (SELECT * FROM foo WHERE bar = TRUE) cte
WHERE cte.id > 10 AND cte.id < 20

This is a consideration where one has optional or dynamic predicates in the outer query. The slightly faster CTE version would be

WITH cte AS (SELECT * FROM foo WHERE foo.bar = True AND foo.id > 10 AND foo.id < 20)
SELECT * FROM cte

This is by design, and you can rely on this behaviour to create an optimization barrier.

CTEs are allowed wherever a SELECT clause is allowed. So, it is possible to use CTEs inside of INSERT, UPDATE or DELETE statements. I'm not sure if this is part of the SQL standard.

For instance, until version 9.5 and the introduction of INSERT ... ON CONFLICT syntax, we could use a CTE to perform an UPSERT. Here's a SO thread that illustrates with an example

There is a 2nd more interesting type of CTE, RECURSIVE CTE, where a CTE is composed of a union of an iterative part and an applicative part that can work on values generated in the iterative part. I don't think this type of query could be inlined anyway.

Haleemur Ali
  • 26,718
  • 5
  • 61
  • 85