3

I am having issues making postgres use my multi-column indexes for full search using the btree_gin extension. This is for a search page for articles. The idea behind the use of btree_gin is to be able to get the 'id' field for sorting and magazine_id as a filter:

CREATE INDEX idx_gin_search ON article USING gin(id, magazine_id, search_vector_full) WITH (fastupdate = off);

Postgres decides to use a btree index on magazine instead and then filter (=slow):

Executed SQL
SELECT ••• FROM article WHERE (( (article.search_vector) @@    
(plainto_tsquery('pg_catalog.english', 'interesting'))) AND    
article.magazine_id = 7) ORDER BY article.id ASC LIMIT 36
Time 13.4780406952 ms

QUERY PLAN
Limit  (cost=2021.87..2021.96 rows=36 width=384) (actual time=9.782..9.787 rows=36 loops=1)
  ->  Sort  (cost=2021.87..2027.49 rows=2248 width=384) (actual time=9.781..9.784 rows=36 loops=1)
    Sort Key: id
    Sort Method: top-N heapsort  Memory: 53kB
    ->  Index Scan using idx_magazine_id on article (cost=0.29..1952.53 rows=2248 width=384) (actual time=0.035..8.924 rows=2249 loops=1)
          Index Cond: (magazine_id = 7)
          Filter: (search_vector @@ '''interesting'''::tsquery)
          Rows Removed by Filter: 11413
Planning time: 4.600 ms
Execution time: 9.860 ms

Then, what I find even understand less, is that it also refuses to use this simple btree index on the LIST page for articles, where they are just listed x per page in descending order:

CREATE INDEX idx_btree_listing ON article USING btree(id DESC, magazine_id);

Again it doesn't use the multi-column index:

Executed SQL
SELECT ••• FROM article WHERE article.magazine_id = 7
ORDER BY article.id DESC LIMIT 36
Time 1.4750957489 ms

QUERY PLAN
Limit  (cost=0.29..7.48 rows=36 width=384) (actual time=0.034..0.115 rows=36 loops=1)
->  Index Scan Backward using idx_magazine_id on article  (cost=0.29..2729.56 rows=13662 width=384) (actual time=0.031..0.107 rows=36 loops=1)
    Filter: (magazine_id = 7)
    Planning time: 1.354 ms
    Execution time: 0.207 ms

EDIT: The above is a development setup with less records and only 1 magazine, hence the fast speed. Here is a log produced by auto_explain on the production server:

duration: 230.629 ms  plan:
SELECT article.id, article.title, article.date, article.content FROM article WHERE article.magazine_id = 7 ORDER BY article.id DESC LIMIT 36

Limit  (cost=0.42..43.67 rows=36 width=306) (actual time=229.876..229.995 rows=36 loops=1)
    ->  Index Scan Backward using idx_magazine_id on article (cost=0.42..239539.22 rows=199379 width=306) (actual time=229.866..229.968 rows=36 loops=1)
    Filter: (article.magazine_id = 7)
    Rows Removed by Filter: 116414

I would be grateful anyone could give me tips for my further debugging on this one.

Mike Sherrill 'Cat Recall'
  • 91,602
  • 17
  • 122
  • 185
tdma
  • 182
  • 1
  • 12
  • 1
    Why should the second statement use the b-tree if the result can completely retrieved using the index `mmxxx_sitevideo_pkey` - with a runtime of 0.2 milliseconds this seems fast enough to me. Planning time took way longer than that, but the total runtime is still only 1.5 **milli** seconds. What performance do you expect? How fast do you need that to be? Although I have to admit the planning times (especially the first one) seem quite high for such simple statements. –  Feb 02 '16 at 06:58
  • Yes what you say makes absolute sense, however I should clarify that these are from a development setup with much less data than the production server. On the production server I've been using the auto_explain module and get the same execution plan. In my development DB I only have one magazine_id, on the real server I get the same query and then it takes longer, sorry I should have added the production query, have now added it to the original message. (Also forgot to rename that index you mention to idx_magazine_id for "simplification" ;-)) – tdma Feb 02 '16 at 11:25
  • Then please post the execution plan (using `explain (analyze, verbose)` from production. –  Feb 02 '16 at 12:48
  • I already have auto_explain.log_analyze = true and auto_explain.log_verbose = true in the initial post. – tdma Feb 02 '16 at 20:09
  • Apparently pgsql only supports index sorting in btree (source: http://www.postgresql.org/docs/9.1/static/indexes-ordering.html) so the "full" index of my query 1 would never work. By putting the filter fields in a btree index and the full-text search field seperate in a gin field, I get it running quickly now with bitmap anding, but ONLY without order_by. Pgsql does not support bitmap and-ing between a sorted multi-column btree and a gin index? – tdma Feb 02 '16 at 23:33

1 Answers1

1

The first column in your multi column index is id. You do not filter on id so postgres won't use that index. You do not have to filter on all columns in the index but the columns on which you do filter have to be the first n columns in the index.

Try experimenting with variations of the index you have like moving id to the end or ommiting id from the index.

Eelke
  • 20,897
  • 4
  • 50
  • 76
  • Hi, thanks for your feedback. I have tried reversing the order of the index fields and also without the id field as you suggest, but the execution plan is the same, and remains well above 100 ms... I guess the sort operation is the problem. Do you know if that query (I'm referring to the second one in my original question) would überhaubt support going all from the index? – tdma Feb 02 '16 at 19:57
  • OK. Query 2 problem is SOLVED. The order_by was confusing the planner. I tried everything to make it work with the query as it was, including changing the default_statistics_target and other parameters of the planner. It kept making the wrong choice (raises the thought again on whether postgres should have index hints). In the end I made a new column called id_sort, with identical content as id, and an index btree(magazine_id, id_sort desc) which is nicely used and runs in a few ms. It's rather absurd though. – tdma Feb 02 '16 at 21:33