Bad estimates in Postgres when joining tables

Question

Currently I'm facing a problem where Postgres's query planer makes bad decisions based on (what I think) seems to be bad estimates, i.e. in a larger query the query planer chooses to do the (hash) join from this post as it's first/inner-most part because the estimate on this join is only 274 rows but actually there are 31770 rows when joining these two tables. That leads to nested loops over these 31770 rows in the larger query, although there are definitely cheaper paths when the query planer would have considered/known the correct row count, i.e. the estimate would have been better.

This can be used to reproduce the problem:

CREATE TABLE b (id int primary key, name text, is_visible boolean);
INSERT INTO b (id, name, is_visible) SELECT x, 'B #' || x, CASE WHEN x % 116 = 0 THEN true ELSE false END FROM generate_series(1, 29499) AS x;
CREATE INDEX ON b (is_visible);

CREATE TABLE a (id bigserial primary key, b_id int references b(id), name text);

WITH dist AS (
    SELECT '{7236,4431,3012,2339,2246,1907,1661,1356,1173,1029,533,505,415,354,336,275,188,168,168,153,152,133,126,125,113,90,73,72,65,64,64,48,46,35,34,31,26,26,26,25,25,25,24,22,22,21,20,20,19,19,15,15,15,15,13,13,12,12,12,12,12,11,11,11,11,8,8,8,8,8,8,8,8,7,7,7,6,6,6,6,6,6,6,6,6,6,5,5,5,5,5,5,5,5,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}'::INT[] cnt
)
INSERT INTO a (b_id, name)
SELECT x.id, x.index || '/' || y
FROM (SELECT id, row_number() over (ORDER BY id) AS index FROM b WHERE is_visible = true) AS x,
generate_series(1, (SELECT cnt[x.index] FROM dist)) AS y
ORDER BY x.id;

I tried to achieve the same distribution of the data, hence the strange array with counts. And that actually seems to have worked, because when I execute the following query on the test data the estimates and actual rows are exactly the same:

postgres=> explain analyze select * from a join b on b.id = a.b_id where b.is_visible=true;
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=35.88..841.99 rows=274 width=31) (actual time=5.251..637.037 rows=31770 loops=1)
   Hash Cond: (a.b_id = b.id)
   ->  Seq Scan on a  (cost=0.00..722.70 rows=31770 width=18) (actual time=0.020..205.874 rows=31770 loops=1)
   ->  Hash  (cost=32.70..32.70 rows=254 width=13) (actual time=5.195..5.212 rows=254 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 20kB
         ->  Index Scan using b_is_visible_idx on b  (cost=0.29..32.70 rows=254 width=13) (actual time=0.044..2.728 rows=254 loops=1)
               Index Cond: (is_visible = true)
 Planning Time: 0.374 ms
 Execution Time: 836.821 ms
(9 rows)

What can I do to optimize the estimate?

[Edit]
I may have asked the wrong question, but I'm actually not interested in optimizing this particular query, but rather understanding where the estimate of 274 comes from and how I get a closer estimation to the actual row count of 31770

[Edit 2]
Using Postgres 12.4

If I recall correctly there has been an issue with the statistics/MCV in the past, which has been corrected. Please add your Postgres version to the question. — wildplasser, Mar 31 '21 at 15:52
Such misestimates are notoriously hard to fix. You may have to disable nested loops for that query. — Laurenz Albe, Mar 31 '21 at 16:22
Thanks for the hint, I added the version to the original post. About disabling nested loops that is not an option because the larger query does 4 additional joins and disabling nested loops completely then hurts. But what actually helped was `SET join_collapse_limit = 1;` for the larger query, which as I understand disables optimizations of Postgres and execute the query the way I wrote it. But I don't know, I would rather love to fix the misestimates and maybe get optimizations from Postgres that may actually help performance instead of disabling all optimizations. — Oliver Saggau, Mar 31 '21 at 16:40

score 0 · Answer 1 · answered Mar 31 '21 at 14:58

an index on a boolean column makes little sense
(a conditional index makes mor sense in your case)
for a FK you need a supporting index
[vacuum] ANALYZE after populating the table(s)

-- \i tmp.sql

CREATE TABLE b (id int primary key, name text, is_visible boolean NOT NULL); -- <<HERE
INSERT INTO b (id, name, is_visible) SELECT x, 'B #' || x, CASE WHEN x % 116 = 0 THEN true ELSE false END FROM generate_series(1, 29499) AS x;
-- CREATE INDEX ON b (is_visible);
CREATE UNIQUE INDEX ON b (id) WHERE is_visible = True; -- <<HERE

CREATE TABLE a (id bigserial primary key, b_id int references b(id), name text);
CREATE INDEX zzzz ON a (b_id ); -- <<HERE

WITH dist AS (
    SELECT '{7236,4431,3012,2339,2246,1907,1661,1356,1173,1029,533,505,415,354,336,275,188,168,168,153,152,133,126,125,113,90,73,72,65,64,64,48,46,35,34,31,26,26,26,25,25,25,24,22,22,21,20,20,19,19,15,15,15,15,13,13,12,12,12,12,12,11,11,11,11,8,8,8,8,8,8,8,8,7,7,7,6,6,6,6,6,6,6,6,6,6,5,5,5,5,5,5,5,5,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}'::INT[] cnt
)
INSERT INTO a (b_id, name)
SELECT x.id, x.index || '/' || y
FROM (SELECT id, row_number() over (ORDER BY id) AS index FROM b WHERE is_visible = true) AS x,
generate_series(1, (SELECT cnt[x.index] FROM dist)) AS y
ORDER BY x.id;

-- I tried to achieve the same distribution of the data, hence the strange array with counts. And that actually seems to have worked, because when I execute the following query on the test data the estimates and actual rows are exactly the same:

VACUUM ANALYZE a; -- <<HERE
VACUUM ANALYZE b; -- <<HERE

explain analyze
select *
from a join b on b.id = a.b_id
where b.is_visible=true;

Resulting plan:

DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 29499
CREATE INDEX
CREATE TABLE
CREATE INDEX
INSERT 0 31770
VACUUM
VACUUM
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=11.73..615.84 rows=274 width=31) (actual time=0.285..11.952 rows=31770 loops=1)
   Hash Cond: (a.b_id = b.id)
   ->  Seq Scan on a  (cost=0.00..520.70 rows=31770 width=18) (actual time=0.006..2.577 rows=31770 loops=1)
   ->  Hash  (cost=8.55..8.55 rows=254 width=13) (actual time=0.263..0.264 rows=254 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 20kB
         ->  Index Scan using b_id_idx on b  (cost=0.14..8.55 rows=254 width=13) (actual time=0.009..0.223 rows=254 loops=1)
 Planning Time: 0.435 ms
 Execution Time: 13.011 ms
(8 rows)

Thanks. Of course all valid points. I missed the supporting index while creating the test schema. And also I forgot to mention I did the `ANALYZE` by hand before executing the query. But as I said this is only a small part of a larger query, and even after your changes the resulting plan still thinks there are only 274 rows when he actually got 31770 rows. I'm pretty sure that's the main problem, because as far as I have understood the query planer in my larger query chooses this path over others based on estimated cost * 274 (which should have been estimated cost * 31770) — Oliver Saggau, Mar 31 '21 at 15:16
I also tried experimenting with default_statistics_target, and extended statistics. Forcing `SET enable_seqscan = 0;` does result in an index+index join (with about the same running time) And: sorry I cannot comment on your invisible larger query. — wildplasser, Mar 31 '21 at 15:38
Yes, may bad. My question was not good/precise enough. As for the larger query I would have to invest a lot of time to come up with an anonymized schema and accurate test data so I broke it down into this smaller section, because I think (hope) when I somehow get Postgres to have a better estimate here, that should solve the problem of the larger query. — Oliver Saggau, Mar 31 '21 at 15:53
Yes, I understand that. But your question illustrates the mis-estimate quite nicely. (BTW: the `31770` is exactly the number of rows in table `a`) — wildplasser, Mar 31 '21 at 15:55
Yes, table `a` can only ever contain items from table `b` where `is_visible = true`. When we started table `a` was empty, but over time all those items from table `b` got at least one entry in table `a` (some have more, some less) that's why the actual row count is currently all rows from table `a`, but unfortunately the query planer doesn't seem to get this fact. — Oliver Saggau, Mar 31 '21 at 16:06

score 0 · Answer 2 · answered Mar 31 '21 at 20:15

You said your goal is not to optimize that query's execution, but to improve its cardinality estimate so that a related query is automatically optimized.

The fundamental problem is that there are no suitable conditional statistics. The only way I see to get around that is to partition the table.

create table c (like b) partition by list (is_visible);
create table c_true partition of c for values in (true);
create table c_false partition of c for values in (false);
insert into c select * from b;
vacuum ANALYZE c;
explain analyze select * from a join c as b on b.id = a.b_id where b.is_visible=true;

Now it is smart enough to use the stats from just the correct partition for the estimate. And so the estimate is now spot on. Ordinarily I would say that partitioning a table of this size is absurd, so be sure to document why it was done.

It would be nice if PostgreSQL would collect and use stats on partial indexes, the same way it does on expression indexes. Then just making a partial index would be a suitable alternative to partitioning. You could also conceive of extending "CREATE STATISTICS" to take a WHERE clause. But neither of those currently are implemented.

Sorry, I'm pretty to the whole statistics stuff so please bare with me, but what stats on table `c` or the partition `c_true` exactly makes the join estimate suddenly be dead on? Because even when I add more items that have `is_visible=true` that don't have an equivalent in table `a` the estimate is still dead on. As for partitioning the table `b` that unfortunately is not possible, because a) I have (other) columns in that (real) table that need to be unique over all partitions and b) that flag is_visible can be changed so I would to manually cover this move to the other partition — Oliver Saggau, Apr 01 '21 at 06:26

Bad estimates in Postgres when joining tables

2 Answers2