10

I am testing performance for PostgreSQL full text search (using pg_search gem) and solr (sunspot_solr gem).

For 4 million records I am getting 13456 ms for Tsearch and 800 ms with SOLR (that is SOLR query + DB retrival). It is obvious that I need index but I am not sure how to create one for full text search. I investigated and found that for full text search I should use GIN index.

execute "CREATE INDEX products_gin_title ON products USING GIN(to_tsvector('english', title))"

But I am searching via two more columns and I need multi-value index and I am not sure how to implement it? I am not very familiar with DB part. My search code looks like:

@results = Product.search_title(params[:search_term]).where("platform_id=? AND product_type=?", params[:platform_id], params[:type_id]).limit(10).all

How do I create proper query for this type of situations?

Here is SQL output from rails for search term car.

Product Load (12494.0ms)
SELECT 
    "products".*, 
    ( ts_rank((to_tsvector('simple', coalesce("products"."title"::text, ''))), (to_ tsquery('simple', ''' ' || 'car' || ' ''')), 2) ) AS pg_search_rank 
FROM "products" 
WHERE (((to_tsvector('simple', coalesce("products"."tit le"::text, ''))) @@ (to_tsquery('simple', ''' ' || 'car' || ' ''')))) 
    AND (platform_id='26' AND product_type='2') 
ORDER BY pg_search_rank DESC, "products"."id" ASC 
LIMIT 10

EDIT:

I am using PostgreSQL 8.4.11, EXPLAIN ANALYZE output is following.

Limit  (cost=108126.34..108126.36 rows=10 width=3824) (actual time=12228.736..12228.738 rows=10 loops=1)   
->  Sort (cost=108126.34..108163.84 rows=14999 width=3824) (actual time=12228.733..12228.734 rows=10 loops=1)
    Sort Key: (ts_rank(to_tsvector('simple'::regconfig, COALESCE((title)::text, ''::text)), '''car'''::tsquery, 2)), id
    Sort Method:  top-N heapsort  Memory: 18kB
    ->  Seq Scan on products  (cost=0.00..107802.22 rows=14999 width=3824) (actual time=7.532..12224.585 rows=977 loops=1)
        Filter: ((platform_id = 26) AND (product_type = 2) AND (to_tsvector('simple'::regconfig, COALESCE((title)::text, ''::text)) @@ '''car'''::tsquery)) 

Total runtime: 12228.813 ms
Quassnoi
  • 413,100
  • 91
  • 616
  • 614
Haris Krajina
  • 14,824
  • 12
  • 64
  • 81
  • Please post the output of running `EXPLAIN ANALYZE` on the posted query, plus your Pg version, any postgresql.conf params you've changed, etc. – Craig Ringer Jun 04 '12 at 11:50

1 Answers1

10

This expression:

to_tsvector('simple', (COALESCE(title::TEXT), ''))

is not sargable against your index.

You should declare the index on the exactly that expression which is used in the query:

CREATE INDEX products_gin_title
ON products
USING GIN(to_tsvector('simple', COALESCE(title::TEXT,'')))

(or make ruby generate the expression which is used in the index).

If you want multiple columns to be indexed, just concatenate them:

CREATE INDEX products_gin_title
ON products
USING GIN(to_tsvector('simple', title || ' ' || product_type || ' ' || platform_id))

but again, Ruby should be filtering on exactly same expression for the index to be of use.

Haris Krajina
  • 14,824
  • 12
  • 64
  • 81
Quassnoi
  • 413,100
  • 91
  • 616
  • 614
  • Thank you fist single column index worked now query time is 80 ms, nice! Question for you in addition to this is should it be multi column index to further increase speed? When you say concatenate them, does that mean I should do full text search by concatenating FK ids to my search term? – Haris Krajina Jun 04 '12 at 12:34
  • In this example I am searching for title via full text, and also narrowing results with product_type, platform_id (mentioned FKs). I am under impression that if you want to speed up specific query you need to add index for ALL columns used in that query. – Haris Krajina Jun 04 '12 at 12:55
  • 1
    @Dolphin: you cannot create composite `GIN` indexes. Well, you can but it's a very complex task which requires creating custom operator classes and other dark magic. This also requires `SUPER` privilege (which means you cannot do this on hosted databases etc.) – Quassnoi Jun 04 '12 at 13:25
  • This cut my search time down from 3000ms to 60ms. Thank you. – LMo Nov 13 '14 at 04:46