4

I've run into an issue where PostgreSQL (13.1 on windows, default config) would start preferring a generic plan after 5 executions because the plan estimate is clearly wrong. Thing is, the custom plan is up to 10 times faster than the generic plan, and at least 2.5 times faster in the "worst" case. I don't want to use set plan_cache_mode = force_custom_plan, because the query is part of a software run in parallel with connection pooling over several threads, as well as dozens of smaller and larger queries, and it would feel like doing a hacky workaround that might also be dangerous for other queries.

I think I've tracked it down to the generic plan estimate being wrong for the x.id = ANY($1) parts in the where clause which I'Ve got three of.

Query goes like this:

SELECT ... FROM ... WHERE ... x.id = ANY($1) AND y.id = ANY($2) AND z.id = ANY($3) ...

of course, this is hugely simplified, but I know that I pass an int8[] of 50 entries for the first parameter 99% of the time (because the query is for a paged view of records, where one page has 50 records), which is correctly estimated as 50 rows for the custom query plan, but 10 rows for the generic query plan.

Switching to using IN($1, $2, ... $50) fixes this, but we're currently trying to move over our IN clauses to ANY, because it's more efficient over JDBC and we've been bitten by the parameter limit a few times with IN (which doesn't happen with ANY, as then it's only one parameter). Also, this would make the count of parameters variable, so the query planner would get a different query quite often (only $1 is pretty much always an array of 50 values on our prod system, the others may have less or more depending on lots of factors).

So far I've tried, without much success:

  • casting things to try to hint the planner
  • create statistics on various involved tables
  • increase statistics target on involved columns
  • using unnest (this fixes the estimate, but then the query is about 20 times slower in any case, so not really a solution for me) and a few dozen other "hacky" workarounds like using regex split to table and the like.

Note that I do know why it uses a generic plan after the 5th execution - because the estimate of that is lower than the average estimate of the 5 previous custom plans. But I don't know how to fix that wrong estimate.

Here's the relevant rows of the custom plan:

->  Bitmap Heap Scan on sbuilding b  (cost=150.51..332.59 rows=50 width=29) (actual time=0.077..0.187 rows=50 loops=2)
    Recheck Cond: (id = ANY ('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50}'::bigint[]))
    Filter: (deletedat IS NULL)
    Heap Blocks: exact=50
    ->  Bitmap Index Scan on sbuilding_pkey  (cost=0.00..150.50 rows=50 width=0) (actual time=0.065..0.066 rows=50 loops=2)
        Index Cond: (id = ANY ('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50}'::bigint[]))

And the same part for the generic plan:

->  Index Scan using sbuilding_pkey on sbuilding b  (cost=0.28..82.84 rows=10 width=29) (actual time=0.049..0.229 rows=50 loops=2)
    Index Cond: (id = ANY ($2))
    Filter: (deletedat IS NULL)

I've replaced actual real ids with numbers 1-50, but the real ids all exist, as they are retrieved from the db in a step before this one.

I'd prefer not having to post the full query or query plans, as those contain a lot of sensitive information (would be hundreds of words and values to replace for me), but I hope these parts are enough to show the issue and hopefully one of you will be able to assist.

Edit: I don't have any array extension like intArray installed explicitly - found a mailing list thread regarding estimates with intArray extension, so I thought I'd add this. Not sure though if it still applies? Here's the link: https://www.postgresql.org/message-id/20150317191531.GE10492%40momjian.us

Thanks a lot!

Edit2: I just retested with a super simple query and it seems that the query planner always assumes 10 values for = ANY(:arrayParam) for generic plans. This seems to be the culprit in my case. Is there any way to "fix" that assumption? Preferable without installing any extensions that need config changes like pg_hint_plan. Thx!

Edit3: found the following in postgres source code selfuncs.c:

/*
         * Arbitrarily assume 10 elements in the eventual array value (see
         * also estimate_array_length).  We don't risk an assumption of
         * disjoint probabilities here.
         */
        for (i = 0; i < 10; i++)
        {
            if (useOr)
                s1 = s1 + s2 - s1 * s2;
            else
                s1 = s1 * s2;
        }

/*
 * Estimate number of elements in the array yielded by an expression.
 *
 * It's important that this agree with scalararraysel.
 */
int
estimate_array_length(Node *arrayexpr)
{
    /* look through any binary-compatible relabeling of arrayexpr */
    arrayexpr = strip_array_coercion(arrayexpr);

    if (arrayexpr && IsA(arrayexpr, Const))
    {
        Datum       arraydatum = ((Const *) arrayexpr)->constvalue;
        bool        arrayisnull = ((Const *) arrayexpr)->constisnull;
        ArrayType  *arrayval;

        if (arrayisnull)
            return 0;
        arrayval = DatumGetArrayTypeP(arraydatum);
        return ArrayGetNItems(ARR_NDIM(arrayval), ARR_DIMS(arrayval));
    }
    else if (arrayexpr && IsA(arrayexpr, ArrayExpr) &&
             !((ArrayExpr *) arrayexpr)->multidims)
    {
        return list_length(((ArrayExpr *) arrayexpr)->elements);
    }
    else
    {
        /* default guess --- see also scalararraysel */
        return 10;
    }
}

I'll see where I can get from there, but I still hope someone has a solution that does not involve me going through the whole postgres source code ;)

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
griffin
  • 1,261
  • 8
  • 24
  • I take it, that this is all inside a function/procedure? Have you tried loading your $1 parameter into a temporary table, and then doing a join on the temp table? That way you should get around the arbitrary 10 estimate for `ANY` as the main query would no longer use `ANY` at all, and you would still only be using a single parameter. Just a thought. – Jonathan Willcock Feb 03 '22 at 23:02
  • @JonathanWillcock I've tried that and yes it fixes the estimate, but then my query becomes about 5 times slower in any case, so it doesn't really help in the end. Also tried the same using unnest and various other ways of creating a temp table, cte, values list etc but they all led to the query being significantly slower :( – griffin Feb 04 '22 at 09:14
  • @JonathanWillcock wrt. your first question: It's passed via hibernate nativeQuery, so JDBC, which in return uses server side prepared statements in this case unless disabled (which I'd also prefer not to). – griffin Feb 04 '22 at 09:39
  • Using a permanent table with indexing and getting PG to update the statistics after truncating and filling the table might help the speed issues. – Jonathan Willcock Feb 06 '22 at 06:58

2 Answers2

2

That can be done by changing plan_cache_mode for this query:

BEGIN;
SET LOCAL plan_cache_mode = force_custom_plan;
SELECT /* your query */
COMMIT;

That will make the optimizer always use a custom plan.

Strahinja Kustudic
  • 4,175
  • 1
  • 24
  • 18
Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • Thanks for the answer, but I'd prefer a solution without forcing plan_cache_mode as written in my question. – griffin Feb 04 '22 at 09:12
  • Ah, right. I didn't see that. But I don't think there is a better way. If you use `SET LOCAL`, the change only applies to the current transaction, so there is not danger to mess up other execution plans. – Laurenz Albe Feb 04 '22 at 09:39
0

If you do not want to go for the solution of setting plan_cache_mode that was suggested by Laurenz Albe, I have this suggestion:

Add an additional condition as a literal to your prepared statement that

  • always evaluates to true and
  • changes each time.

Epoch with milliseconds could be used for instance:

SELECT ... FROM ... WHERE ... x.id = ANY($1) AND y.id = ANY($2) AND z.id = ANY($3) ... AND 1678193956123=1678193956123

SELECT ... FROM ... WHERE ... x.id = ANY($1) AND y.id = ANY($2) AND z.id = ANY($3) ... AND 1678193974456=1678193974456

SELECT ... FROM ... WHERE ... x.id = ANY($1) AND y.id = ANY($2) AND z.id = ANY($3) ... AND 1678193987789=1678193987789

Or you could use a hash of your input params etc.

Peter
  • 932
  • 5
  • 19
  • 1
    Thanks for the suggestion, but I can't really upvote it. It's more of a dirty workaround than a fix, and it still basically forces a custom plan. – griffin May 10 '23 at 11:54
  • As you wish. We are doing it this way and it works for us but I agree that an 'official' way would be nice. – Peter May 10 '23 at 15:01