5

I have two queries which do the same thing. 1

    SELECT *
    FROM "Products_product"
    WHERE ("Products_product"."id" IN
           (SELECT U0."product_id"
            FROM "Products_purchase" U0
            WHERE (U0."state" = 1
                   AND U0."user_id" = 5))
           AND "Products_product"."state" IN (1,
                                              6,
                                              3)
           AND UPPER("Products_product"."title" :: TEXT) LIKE UPPER('%toronto%'))
    ORDER BY "Products_product"."title_index" ASC
    LIMIT 10;

2

SELECT *
FROM "Products_product"
WHERE ("Products_product"."id" = ANY (ARRAY(
       (SELECT U0."product_id"
        FROM "Products_purchase" U0
        WHERE (U0."state" = 1
               AND U0."user_id" = 5))))
       AND "Products_product"."state" IN (1,
                                          6,
                                          3)
       AND UPPER("Products_product"."title" :: TEXT) LIKE UPPER('%toronto%'))
ORDER BY "Products_product"."title_index" ASC
LIMIT 10;

The only difference is the first one use IN for subquery while the second use =ANY(ARRAY()). But the second one is significant faster than the first about 10 times. I ran the explanation and I got these two results: 1

 Limit  (cost=5309.92..5309.93 rows=1 width=1906) (actual time=3414.185..3414.190 rows=10 loops=1)
   ->  Sort  (cost=5309.92..5309.93 rows=1 width=1906) (actual time=3414.184..3414.185 rows=10 loops=1)
         Sort Key: "Products_product".title
         Sort Method: quicksort  Memory: 57kB
         ->  Nested Loop Semi Join  (cost=92.66..5309.91 rows=1 width=1906) (actual time=3385.153..3414.099 rows=16 loops=1)
               ->  Bitmap Heap Scan on "Products_product"  (cost=13.85..256.32 rows=61 width=1906) (actual time=3381.327..3384.430 rows=63 loops=1)
                     Recheck Cond: ((state = ANY ('{1,6,3}'::integer[])) AND (upper((title)::text) ~~ '%TORONTO%'::text))
                     Rows Removed by Index Recheck: 1
                     Heap Blocks: exact=64
                     ->  Bitmap Index Scan on "Products_product_state_id_upper_idx"  (cost=0.00..13.83 rows=61 width=0) (actual time=3381.001..3381.001 rows=64 loops=1)
                           Index Cond: ((state = ANY ('{1,6,3}'::integer[])) AND (upper((title)::text) ~~ '%TORONTO%'::text))
               ->  Bitmap Heap Scan on "Products_purchase" u0  (cost=78.82..82.84 rows=1 width=4) (actual time=0.467..0.467 rows=0 loops=63)
                     Recheck Cond: ((product_id = "Products_product".id) AND (user_id = 5))
                     Filter: (state = 1)
                     Heap Blocks: exact=16
                     ->  BitmapAnd  (cost=78.82..78.82 rows=1 width=0) (actual time=0.465..0.465 rows=0 loops=63)
                           ->  Bitmap Index Scan on "Products_purchase_product_id"  (cost=0.00..5.06 rows=84 width=0) (actual time=0.265..0.265 rows=30 loops=63)
                                 Index Cond: (product_id = "Products_product".id)
                           ->  Bitmap Index Scan on "Products_purchase_user_id"  (cost=0.00..72.57 rows=3752 width=0) (actual time=0.242..0.242 rows=3335 loops=51)
                                 Index Cond: (user_id = 5)
 Planning time: 7.540 ms
 Execution time: 3414.356 ms
(22 rows)

2

Limit  (cost=7378.07..7378.07 rows=1 width=1906) (actual time=116.559..116.562 rows=10 loops=1)
   InitPlan 1 (returns $0)
     ->  Index Scan using "Products_purchase_user_id" on "Products_purchase" u0  (cost=0.43..7329.83 rows=3752 width=4) (actual time=0.021..15.535 rows=3335 loops=1)
           Index Cond: (user_id = 5)
           Filter: (state = 1)
   ->  Sort  (cost=48.24..48.25 rows=1 width=1906) (actual time=116.558..116.559 rows=10 loops=1)
         Sort Key: "Products_product".title
         Sort Method: quicksort  Memory: 57kB
         ->  Bitmap Heap Scan on "Products_product"  (cost=44.20..48.23 rows=1 width=1906) (actual time=116.202..116.536 rows=16 loops=1)
               Recheck Cond: ((id = ANY ($0)) AND (upper((title)::text) ~~ '%TORONTO%'::text))
               Filter: (state = ANY ('{1,6,3}'::integer[]))
               Rows Removed by Filter: 2
               Heap Blocks: exact=18
               ->  Bitmap Index Scan on "Products_product_id_upper_idx1"  (cost=0.00..44.20 rows=1 width=0) (actual time=116.103..116.103 rows=18 loops=1)
                     Index Cond: ((id = ANY ($0)) AND (upper((title)::text) ~~ '%TORONTO%'::text))
 Planning time: 1.054 ms
 Execution time: 116.663 ms
(17 rows)

From doc, there's no substantial difference between IN or ANY. But why I get so difference results. Is it ANY has advantage over IN in any case?

Update: Someone pointed this question may be duplicated with IN vs ANY operator in PostgreSQL. They are kind of same question, but the answer in that question didn't solve my problem, because I have a more detail case beyond that answer.

But the second variant of each is not equivalent to the other. The second variant of the ANY construct takes an array (must be an actual array type), while the second variant of IN takes a comma-separated list of values. This leads to different restrictions in passing values and can also lead to different query plans in special cases:

https://dba.stackexchange.com/a/125500/3684

Pass multiple sets or arrays of values to a function

In my question, it's not the case with either question. I am just passing one array as subquery. And my case is exactly opposite to the first URL. My index is only used in ANY but not in IN. So basically, this answer did not solve my problem.

UPDATE2: I update the index: CREATE INDEX ON "Products_product" USING GIST (state, id, upper((title) :: TEXT) gist_trgm_ops);. I can confirm the circumstance is equal for both query, that means the index existed there, but the first doesn't use it.

UPDATE3: I just remove ARRAY in the code. But result is same.

 explain analyze SELECT *
FROM "Products_product"
WHERE ("Products_product"."id" = ANY(
       (SELECT U0."product_id"
        FROM "Products_purchase" U0
        WHERE (U0."state" = 1
               AND U0."user_id" = 5)))
       AND "Products_product"."state" IN (1,
                                          6,
                                          3)
       AND UPPER("Products_product"."title" :: TEXT) LIKE UPPER('%toronto%'))
ORDER BY "Products_product"."title" ASC
LIMIT 10;
                                                                              QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=5309.92..5309.93 rows=1 width=1906) (actual time=228.980..228.983 rows=10 loops=1)
   ->  Sort  (cost=5309.92..5309.93 rows=1 width=1906) (actual time=228.979..228.980 rows=10 loops=1)
         Sort Key: "Products_product".title
         Sort Method: quicksort  Memory: 57kB
         ->  Nested Loop Semi Join  (cost=92.66..5309.91 rows=1 width=1906) (actual time=216.392..228.913 rows=16 loops=1)
               ->  Bitmap Heap Scan on "Products_product"  (cost=13.85..256.32 rows=61 width=1906) (actual time=214.332..215.260 rows=63 loops=1)
                     Recheck Cond: ((state = ANY ('{1,6,3}'::integer[])) AND (upper((title)::text) ~~ '%TORONTO%'::text))
                     Rows Removed by Index Recheck: 1
                     Heap Blocks: exact=64
                     ->  Bitmap Index Scan on "Products_product_state_id_upper_idx"  (cost=0.00..13.83 rows=61 width=0) (actual time=214.296..214.296 rows=64 loops=1)
                           Index Cond: ((state = ANY ('{1,6,3}'::integer[])) AND (upper((title)::text) ~~ '%TORONTO%'::text))
               ->  Bitmap Heap Scan on "Products_purchase" u0  (cost=78.82..82.84 rows=1 width=4) (actual time=0.215..0.215 rows=0 loops=63)
                     Recheck Cond: ((product_id = "Products_product".id) AND (user_id = 5))
                     Filter: (state = 1)
                     Heap Blocks: exact=16
                     ->  BitmapAnd  (cost=78.82..78.82 rows=1 width=0) (actual time=0.212..0.212 rows=0 loops=63)
                           ->  Bitmap Index Scan on "Products_purchase_product_id"  (cost=0.00..5.06 rows=84 width=0) (actual time=0.017..0.017 rows=30 loops=63)
                                 Index Cond: (product_id = "Products_product".id)
                           ->  Bitmap Index Scan on "Products_purchase_user_id"  (cost=0.00..72.57 rows=3752 width=0) (actual time=0.239..0.239 rows=3335 loops=51)
                                 Index Cond: (user_id = 5)
 Planning time: 5.083 ms
 Execution time: 229.904 ms
(22 rows)

I think it's not the case with ANY or ANY(ARRAY()), it's just the difference between IN and ANY

Bouke Versteegh
  • 4,097
  • 1
  • 39
  • 35
Tiancheng Liu
  • 782
  • 2
  • 9
  • 22
  • 2
    Possible duplicate of [IN vs ANY operator in PostgreSQL](https://stackoverflow.com/questions/34627026/in-vs-any-operator-in-postgresql) – JGH Jul 07 '19 at 11:39
  • 1
    @JGH I update the question, I don't think it's duplicate. – Tiancheng Liu Jul 08 '19 at 00:18
  • 1
    The optimizer will rewrite an IN clause with constant values to an any clause, e.g. `where x in (1,2,3)` will be rewritten to `where x = any (array[1,2,3])` because in general `= ANY()` is faster than `IN` - at least for constant values. I am surprised that this is also the case for IN together with a subquery. You might want to ask this on the [Postgres performance mailing list](https://www.postgresql.org/list/) as the core developers read that and can probably shed more light on that –  Jul 08 '19 at 06:03
  • Unrelated, but: you really should avoid those dreaded quoted identifiers. They are much more trouble in the long run than they are worth it. –  Jul 08 '19 at 06:03

1 Answers1

2

The different plans are not caused by IN vs. = ANY, but by the additional ARRAY() around the subselect in the second query. Without that, the plans are identical.

The difference is that in the slow execution, the index scan takes a long time, while in the (exactly identical) plan from your edit the same scan is fast:

Slow:

->  Bitmap Index Scan on "Products_product_state_id_upper_idx"  (cost=0.00..13.83 rows=61 width=0) (actual time=3381.001..3381.001 rows=64 loops=1)
      Index Cond: ((state = ANY ('{1,6,3}'::integer[])) AND (upper((title)::text) ~~ '%TORONTO%'::text))

Fast:

->  Bitmap Index Scan on "Products_product_state_id_upper_idx"  (cost=0.00..13.83 rows=61 width=0) (actual time=214.296..214.296 rows=64 loops=1)
      Index Cond: ((state = ANY ('{1,6,3}'::integer[])) AND (upper((title)::text) ~~ '%TORONTO%'::text))

It is also interesting to note that in the slow plan, it took 3 seconds to produce the first row of the index scan.

Whatever the problem is, it is transient. The only suspicion that comes to mind are killed index tuples: a mass delete left many index tuples that point to dead heap tuples which only have to be scanned the first time, because after that they are marked as dead.

Did you have any mass deletes?

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • Products_product_state_id_upper_idx is the trigram index on "Products_product".title. But it's actually a 3 column index with state, title and id. Those two queries are run exactly in same circumstance. Even I clean the cache, use IN always 10 times slower than ANY. – Tiancheng Liu Jul 08 '19 at 13:00
  • I repeat, the queries are not identical, as I said in the answer. Maybe things become clearer if you edit the question and add the `CREATE INDEX` statements. – Laurenz Albe Jul 08 '19 at 13:19
  • Hmm, I think those two statements are the same. Isn't that the same kind of rewriting as with `where x in (1,2,3)` and `where x = any(array[1,2,3])`? –  Jul 08 '19 at 14:15
  • Yes, but `WHERE x IN (SELECT something)` (which is transformed to `WHERE x = ANY (SELECT something)`) is different from `WHERE x = ANY (array(SELECT something))`. – Laurenz Albe Jul 08 '19 at 14:18
  • I have updated the answer. So, contrary to what you are saying, removing the `array()` caused the plans to become identical, and the cause of the slow index scan has to be sought elsewhere. – Laurenz Albe Jul 08 '19 at 21:48