4

We have a Product table in postgres DB. This is hosted on Heroku. We have 8 GB RAM and 250 GB disk space. 1000 IPOP allowed. We are having proper indexes on columns.

Platform

PostgreSQL 9.5.12 on x86_64-pc-linux-gnu (Ubuntu 9.5.12-1.pgdg14.04+1), compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit

We are running a keywords search query on this table. We are having 2.8 millions records in this table. Our search query is too slow. Its giving us result in about 50 seconds. Which is too slow.

Query

SELECT
    P .sfid AS prodsfid,
    P .image_url__c image,
    P .productcode sku,
    P .Short_Description__c shortDesc,
    P . NAME pname,
    P .category__c,
    P .price__c price,
    P .description,
    P .vendor_name__c vname,
    P .vendor__c supSfid
FROM
    staging.product2 P
JOIN (
    SELECT
        p1.sfid
    FROM
        staging.product2 p1
    WHERE
        p1. NAME ILIKE '%s%'
    OR p1.productcode ILIKE '%s%'
) AS TEMP ON (P .sfid = TEMP .sfid)
WHERE
    P .status__c = 'Available'
AND LOWER (
    P .vendor_shipping_country__c
) = ANY (

    VALUES
        ('us'),
        ('usa'),
        ('united states'),
        ('united states of america')
)
AND P .vendor_catalog_tier__c = ANY (

    VALUES
        ('a1c37000000oljnAAA'),
        ('a1c37000000oljQAAQ'),
        ('a1c37000000oljQAAQ'),
        ('a1c37000000pT7IAAU'),
        ('a1c37000000omDjAAI'),
        ('a1c37000000oljMAAQ'),
        ('a1c37000000oljaAAA'),
        ('a1c37000000pT7SAAU'),
        ('a1c0R000000AFcVQAW'),
        ('a1c0R000000A1HAQA0'),
        ('a1c0R0000000OpWQAU'),
        ('a1c0R0000005TZMQA2'),
        ('a1c37000000oljdAAA'),
        ('a1c37000000ooTqAAI'),
        ('a1c37000000omLBAAY'),
        ('a1c0R0000005N8GQAU')
)

Here is the explain plan:

Nested Loop  (cost=31.85..33886.54 rows=3681 width=750)
  ->  Hash Join  (cost=31.77..31433.07 rows=4415 width=750)
        Hash Cond: (lower((p.vendor_shipping_country__c)::text) = "*VALUES*".column1)
        ->  Nested Loop  (cost=31.73..31423.67 rows=8830 width=761)
              ->  HashAggregate  (cost=0.06..0.11 rows=16 width=32)
                    Group Key: "*VALUES*_1".column1
                    ->  Values Scan on "*VALUES*_1"  (cost=0.00..0.06 rows=16 width=32)
              ->  Bitmap Heap Scan on product2 p  (cost=31.66..1962.32 rows=552 width=780)
                    Recheck Cond: ((vendor_catalog_tier__c)::text = "*VALUES*_1".column1)
                    Filter: ((status__c)::text = 'Available'::text)
                    ->  Bitmap Index Scan on vendor_catalog_tier_prd_idx  (cost=0.00..31.64 rows=1016 width=0)
                          Index Cond: ((vendor_catalog_tier__c)::text = "*VALUES*_1".column1)
        ->  Hash  (cost=0.03..0.03 rows=4 width=32)
              ->  Unique  (cost=0.02..0.03 rows=4 width=32)
                    ->  Sort  (cost=0.02..0.02 rows=4 width=32)
                          Sort Key: "*VALUES*".column1
                          ->  Values Scan on "*VALUES*"  (cost=0.00..0.01 rows=4 width=32)
  ->  Index Scan using sfid_prd_idx on product2 p1  (cost=0.09..0.55 rows=1 width=19)
        Index Cond: ((sfid)::text = (p.sfid)::text)
        Filter: (((name)::text ~~* '%s%'::text) OR ((productcode)::text ~~* '%s%'::text))

Its returning around 140,576 records. By the way we need only top 5,000 records only. Will putting Limit help here?

Let me know how to make it fast and what is causing this slow.

EXPLAIN ANALYZE

@RaymondNijland Here is the explain analyze

Nested Loop  (cost=31.83..33427.28 rows=4039 width=750) (actual time=1.903..4384.221 rows=140576 loops=1)
  ->  Hash Join  (cost=31.74..30971.32 rows=4369 width=750) (actual time=1.852..1094.964 rows=164353 loops=1)
        Hash Cond: (lower((p.vendor_shipping_country__c)::text) = "*VALUES*".column1)
        ->  Nested Loop  (cost=31.70..30962.02 rows=8738 width=761) (actual time=1.800..911.738 rows=164353 loops=1)
              ->  HashAggregate  (cost=0.06..0.11 rows=16 width=32) (actual time=0.012..0.019 rows=15 loops=1)
                    Group Key: "*VALUES*_1".column1
                    ->  Values Scan on "*VALUES*_1"  (cost=0.00..0.06 rows=16 width=32) (actual time=0.004..0.005 rows=16 loops=1)
              ->  Bitmap Heap Scan on product2 p  (cost=31.64..1933.48 rows=546 width=780) (actual time=26.004..57.290 rows=10957 loops=15)
                    Recheck Cond: ((vendor_catalog_tier__c)::text = "*VALUES*_1".column1)
                    Filter: ((status__c)::text = 'Available'::text)
                    Rows Removed by Filter: 645
                    Heap Blocks: exact=88436
                    ->  Bitmap Index Scan on vendor_catalog_tier_prd_idx  (cost=0.00..31.61 rows=1000 width=0) (actual time=24.811..24.811 rows=11601 loops=15)
                          Index Cond: ((vendor_catalog_tier__c)::text = "*VALUES*_1".column1)
        ->  Hash  (cost=0.03..0.03 rows=4 width=32) (actual time=0.032..0.032 rows=4 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 9kB
              ->  Unique  (cost=0.02..0.03 rows=4 width=32) (actual time=0.026..0.027 rows=4 loops=1)
                    ->  Sort  (cost=0.02..0.02 rows=4 width=32) (actual time=0.026..0.026 rows=4 loops=1)
                          Sort Key: "*VALUES*".column1
                          Sort Method: quicksort  Memory: 25kB
                          ->  Values Scan on "*VALUES*"  (cost=0.00..0.01 rows=4 width=32) (actual time=0.001..0.002 rows=4 loops=1)
  ->  Index Scan using sfid_prd_idx on product2 p1  (cost=0.09..0.56 rows=1 width=19) (actual time=0.019..0.020 rows=1 loops=164353)
        Index Cond: ((sfid)::text = (p.sfid)::text)
        Filter: (((name)::text ~~* '%s%'::text) OR ((productcode)::text ~~* '%s%'::text))
        Rows Removed by Filter: 0
Planning time: 2.488 ms
Execution time: 4391.378 ms

Another query version, with order by , but it seems very slow as well (140 seconds)

SELECT
P .sfid AS prodsfid,
P .image_url__c image,
P .productcode sku,
P .Short_Description__c shortDesc,
P . NAME pname,
P .category__c,
P .price__c price,
P .description,
P .vendor_name__c vname,
P .vendor__c supSfid
FROM
staging.product2 P
WHERE
P .status__c = 'Available'
AND P .vendor_shipping_country__c IN (
'us',
'usa',
'united states',
'united states of america'
)
AND P .vendor_catalog_tier__c IN (
'a1c37000000omDQAAY',
'a1c37000000omDTAAY',
'a1c37000000omDXAAY',
'a1c37000000omDYAAY',
'a1c37000000omDZAAY',
'a1c37000000omDdAAI',
'a1c37000000omDfAAI',
'a1c37000000omDiAAI',
'a1c37000000oml6AAA',
'a1c37000000oljPAAQ',
'a1c37000000oljRAAQ',
'a1c37000000oljWAAQ',
'a1c37000000oljXAAQ',
'a1c37000000oljZAAQ',
'a1c37000000oljcAAA',
'a1c37000000oljdAAA',
'a1c37000000oljlAAA',
'a1c37000000oljoAAA',
'a1c37000000oljqAAA',
'a1c37000000olnvAAA',
'a1c37000000olnwAAA',
'a1c37000000olnxAAA',
'a1c37000000olnyAAA',
'a1c37000000olo0AAA',
'a1c37000000olo1AAA',
'a1c37000000olo4AAA',
'a1c37000000olo8AAA',
'a1c37000000olo9AAA',
'a1c37000000oloCAAQ',
'a1c37000000oloFAAQ',
'a1c37000000oloIAAQ',
'a1c37000000oloJAAQ',
'a1c37000000oloMAAQ',
'a1c37000000oloNAAQ',
'a1c37000000oloSAAQ',
'a1c37000000olodAAA',
'a1c37000000oloeAAA',
'a1c37000000olzCAAQ',
'a1c37000000om0xAAA',
'a1c37000000ooV1AAI',
'a1c37000000oog8AAA',
'a1c37000000oogDAAQ',
'a1c37000000oonzAAA',
'a1c37000000oluuAAA',
'a1c37000000pT7SAAU',
'a1c37000000oljnAAA',
'a1c37000000olumAAA',
'a1c37000000oljpAAA',
'a1c37000000pUm2AAE',
'a1c37000000olo3AAA',
'a1c37000000oo1MAAQ',
'a1c37000000oo1vAAA',
'a1c37000000pWxgAAE',
'a1c37000000pYJkAAM',
'a1c37000000omDjAAI',
'a1c37000000ooTgAAI',
'a1c37000000op2GAAQ',
'a1c37000000one0AAA',
'a1c37000000oljYAAQ',
'a1c37000000pUlxAAE',
'a1c37000000oo9SAAQ',
'a1c37000000pcIYAAY',
'a1c37000000pamtAAA',
'a1c37000000pd2QAAQ',
'a1c37000000pdCOAAY',
'a1c37000000OpPaAAK',
'a1c37000000OphZAAS',
'a1c37000000olNkAAI'

)
ORDER BY p.productcode asc
LIMIT 5000

Here is the explain analyse for this:

Limit (cost=0.09..45271.54 rows=5000 width=750) (actual time=48593.355..86376.864 rows=5000 loops=1)
-> Index Scan using productcode_prd_idx on product2 p (cost=0.09..743031.39 rows=82064 width=750) (actual time=48593.353..86376.283 rows=5000 loops=1)
Filter: (((status__c)::text = 'Available'::text) AND ((vendor_shipping_country__c)::text = ANY ('{us,usa,"united states","united states of america"}'::text[])) AND ((vendor_catalog_tier__c)::text = ANY ('{a1c37000000omDQAAY,a1c37000000omDTAAY,a1c37000000omDXAAY,a1c37000000omDYAAY,a1c37000000omDZAAY,a1c37000000omDdAAI,a1c37000000omDfAAI,a1c37000000omDiAAI,a1c37000000oml6AAA,a1c37000000oljPAAQ,a1c37000000oljRAAQ,a1c37000000oljWAAQ,a1c37000000oljXAAQ,a1c37000000oljZAAQ,a1c37000000oljcAAA,a1c37000000oljdAAA,a1c37000000oljlAAA,a1c37000000oljoAAA,a1c37000000oljqAAA,a1c37000000olnvAAA,a1c37000000olnwAAA,a1c37000000olnxAAA,a1c37000000olnyAAA,a1c37000000olo0AAA,a1c37000000olo1AAA,a1c37000000olo4AAA,a1c37000000olo8AAA,a1c37000000olo9AAA,a1c37000000oloCAAQ,a1c37000000oloFAAQ,a1c37000000oloIAAQ,a1c37000000oloJAAQ,a1c37000000oloMAAQ,a1c37000000oloNAAQ,a1c37000000oloSAAQ,a1c37000000olodAAA,a1c37000000oloeAAA,a1c37000000olzCAAQ,a1c37000000om0xAAA,a1c37000000ooV1AAI,a1c37000000oog8AAA,a1c37000000oogDAAQ,a1c37000000oonzAAA,a1c37000000oluuAAA,a1c37000000pT7SAAU,a1c37000000oljnAAA,a1c37000000olumAAA,a1c37000000oljpAAA,a1c37000000pUm2AAE,a1c37000000olo3AAA,a1c37000000oo1MAAQ,a1c37000000oo1vAAA,a1c37000000pWxgAAE,a1c37000000pYJkAAM,a1c37000000omDjAAI,a1c37000000ooTgAAI,a1c37000000op2GAAQ,a1c37000000one0AAA,a1c37000000oljYAAQ,a1c37000000pUlxAAE,a1c37000000oo9SAAQ,a1c37000000pcIYAAY,a1c37000000pamtAAA,a1c37000000pd2QAAQ,a1c37000000pdCOAAY,a1c37000000OpPaAAK,a1c37000000OphZAAS,a1c37000000olNkAAI}'::text[])))
Rows Removed by Filter: 1707920
Planning time: 1.685 ms
Execution time: 86377.139 ms

Thanks

Aslam Bari

Aslam Bari
  • 41
  • 2
  • 2
    Can you also show `EXPLAIN ANALYSE query`. And the `CREATE TABLE` structure(s) ( https://stackoverflow.com/questions/2593803/how-to-generate-the-create-table-sql-statement-for-an-existing-table-in-postgr ) – Raymond Nijland May 01 '18 at 13:02
  • Unrelated, but why are you putting a space between your table/alias name and the dot? And, don't use uppercase identifiers like that. If you actually want uppercase, you need to put it in quotes. Otherwise use lowercase. – 404 May 01 '18 at 15:45
  • @eurotrash I just copy pasted the query from postgres client, it gives that spaces. And regarding dot, actually that is a schema "staging" – Aslam Bari May 02 '18 at 14:07
  • @RaymondNijland Explain Analyze added – Aslam Bari May 02 '18 at 14:11

1 Answers1

0

You might want to consider a GIN or GIST index on your staging.product2 table. Double-sided ILIKEs are slow and difficult to improve substantially. I've seen a GIN index improve a similar query by 60-80%.

See this doc.

djkern
  • 452
  • 3
  • 11