5

Suppose I want to query a big table with a few WHERE filters. I am using Postgres 11 and a foreign table; foreign data wrapper (FDW) is clickhouse_fdw. But I am also interested in a general solution.

I can do so as follows:

SELECT id,c1,c2,c3 from big_table where id=3 and c1=2

My FDW is able to do the filtering on the remote foreign data source, ensuring that the above query is quick and doesn't pull down too much data.

The above works the same if I write:

SELECT id,c1,c2,c3 from big_table where id IN (3,4,5) and c1=2

I.e all of the filtering is sent downstream.

However, if the filtering I'm trying to do is slightly more complex:

SELECT bt.id,bt.c1,bt.c2,bt.c3
from big_table bt
join lookup_table l on bt.id=l.id
where c1=2 and l.x=5

then the query planner decides to filter on c1=2 remotely but apply the other filter locally.

In my use case, calculating which ids have l.x=5 first and then sending those off to be filtered remotely will be much quicker, so I tried to write it the following way:

SELECT id,c1,c2,c3
from big_table
where c1=2
and id IN (select id from lookup_table where x=5)

However, the query planner still decides to perform the second filter locally on ALL of the results from big_table that satisfy c1=2, which is very slow.

Is there some way I can "force" (select id from lookup_table where x=5) to be pre-calculated and sent as part of a remote filter?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
ctlaltdefeat
  • 778
  • 1
  • 6
  • 12
  • 1
    I think that the question is entirely sensible even without involving the use of any fdw; having some kind of control over the query planner's execution in cases like this seems like a general matter. In any case, I'm using clickhouse_fdw and Postgres 11. – ctlaltdefeat Apr 29 '20 at 02:07
  • I think qual pushdown to a foreign server is rather different from the general case in plain SQL - and depends a lot on the actual FDW. – Erwin Brandstetter Apr 29 '20 at 02:47

1 Answers1

5

Foreign data wrapper

Typically, joins or any derived tables from subqueries or CTEs are not available on the foreign server and have to be executed locally. I.e., all rows remaining after the simple WHERE clause in your example have to be retrieved and processed locally like you observed.

If all else fails you can execute the subquery SELECT id FROM lookup_table WHERE x = 5 and concatenate results into the query string.

More conveniently, you can automate this with dynamic SQL and EXECUTE in a PL/pgSQL function. Like:

CREATE OR REPLACE FUNCTION my_func(_c1 int, _l_id int)
   RETURNS TABLE(id int, c1 int, c2 int, c3 int) AS
$func$
BEGIN
   RETURN QUERY EXECUTE
     'SELECT id,c1,c2,c3 FROM big_table
      WHERE  c1 = $1
      AND    id = ANY ($2)'
   USING _c1
       , ARRAY(SELECT l.id FROM lookup_table l WHERE l.x = _l_id);
END
$func$  LANGUAGE plpgsql;

Related:

Or try this search on SO.

Or you might use the meta-command \gexec in psql. See:

Or this might work: (Feedback says does not work.)

SELECT id,c1,c2,c3
FROM   big_table
WHERE  c1 = 2
AND    id = ANY (ARRAY(SELECT id FROM lookup_table WHERE x = 5));

Testing locally, I get a query plan like this:

Index Scan using big_table_idx on big_table (cost= ...)
  Index Cond: (id = ANY ($0))
  Filter: (c1 = 2)
  InitPlan 1 (returns $0)
    ->  Seq Scan on lookup_table  (cost= ...)
          Filter: (x = 5)

Bold emphasis mine.

The parameter $0 in the plan inspires hope. The generated array might be something Postgres can pass on to be used remotely. I don't see a similar plan with any of your other attempts or some more I tried myself. Can you test with your fdw?

Related question concerning postgres_fdw:

General technique in SQL

That's a different story. Just use a CTE. But I don't expect that to help with the FDW.

WITH cte AS (SELECT id FROM lookup_table WHERE x = 5)
SELECT id,c1,c2,c3
FROM   big_table b
JOIN   cte USING (id)
WHERE  b.c1 = 2;

PostgreSQL 12 changed (improved) behavior, so that CTEs can be inlined like subqueries, given some preconditions. But, quoting the manual:

You can override that decision by specifying MATERIALIZED to force separate calculation of the WITH query

So:

WITH cte AS MATERIALIZED (SELECT id FROM lookup_table WHERE x = 5)
...

Typically, none of this should be necessary if your DB server is configured properly and column statistics are up to date. But there are corner cases with uneven data distribution ...

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • I just tried your suggestion and got a query plan involving ```InitPlan 1 (returns $0)``` but alas the ```Remote SQL``` still did not include it. Could you please point me to one of your related answers with dynamic SQL and ```EXECUTE```? Thanks! – ctlaltdefeat Apr 29 '20 at 02:48
  • @ctlaltdefeat: Pity. Maybe it works with postgres_fdw in Postgres 12. No test environment here ATM. Anyway, I added a solution with dynamic SQL and some more above. – Erwin Brandstetter Apr 29 '20 at 03:12
  • MATERIALIZED with saved me tons of time and code. – SüniÚr Feb 07 '23 at 11:24