3

Using Postgres 9.5 and I'm trying to improve the performance of searches using the LIKE operator on the column containing a URL.

Consider a table like this with about 50 million rows

CREATE TABLE page_hit (
    timestamp_ timestamp without time zone NOT NULL,
    location_url character varying(2048)
)

and a query like this

SELECT count(*) 
FROM page_hit 
WHERE location_url LIKE 'http://mysite.foo.com/path/morepath%'

The above query makes use of the following index

CREATE INDEX location_idx
  ON page_hit
  USING btree
  (location_url varchar_pattern_ops);

The works great..BUT... to support LIKE wild cards in other parts of the string (for example a leading %) I created a GIN index as follows

CREATE INDEX location_idx_gin
  ON page_hit
  USING gin
  (location COLLATE pg_catalog."default" gin_trgm_ops);

The problem is when I run EXPLAIN ANALYZE on the above query now its only ever using location_idx_gin and ignoring location_idx which kinda sucks cause the location_idx index is faster at trailing % queries. The query planner seems to ignore my BTREE index in all cases and uses the GIN index.

The BTREE index outperforms the GIN index (in the case of a trailing %) but I want the planner to pick the GIN index only in the case of leading %. Is there a way to do this?

Some metrics (queries trailing %):

Mesbah Gueffaf
  • 518
  • 1
  • 7
  • 21
maxTrialfire
  • 532
  • 3
  • 16
  • @JuanCarlosOropeza I am working with full text search: the GIN index. Is there a better way to tokenize the url for indexing? – maxTrialfire Jun 28 '16 at 20:17
  • I told you leading `%` wont use index, only at the end. I guess if you want is possible split the string by `/` and store the url in a parent-child table. – Juan Carlos Oropeza Jun 28 '16 at 20:20
  • I already know leading % wont work. I said so in my question. That was my motivation for using a GIN index. But the optimiser always picks the GIN index over the BTREE index. I will update my question to clarify – maxTrialfire Jun 28 '16 at 20:38
  • Is the text search working? what time is perfoming? what is the problem? – Juan Carlos Oropeza Jun 28 '16 at 20:52
  • The BTREE index outperforms the GIN index (in the case of a trailing %) but I want the planner to pick the GIN index in the case of leading %. When both indexes are in place the planner always picks the GIN index, ignoring the BTREE index – maxTrialfire Jun 28 '16 at 20:58
  • Outperform for how much? Performance questions should include `EXPLAIN ANALYZE` and some information about table size, index, current time performance, desire time, etc. `Slow` is a relative term and we need a real value to compare. – Juan Carlos Oropeza Jun 28 '16 at 21:00
  • Updated question to include timing and explain output – maxTrialfire Jun 28 '16 at 21:13
  • Maybe check this [**answer**](http://stackoverflow.com/a/309814/3470178) But looks like there isnt any way you can do that. They assume the planner is smarter than you and wont give you that option. Even when in this case looks like is taking the wrong decision. – Juan Carlos Oropeza Jun 28 '16 at 21:57
  • Have you tried table inheritance? What if you created a parent table with no indexes defined which you'd leave empty, and two child tables, one for which you'd defined location_idx and into which you'd insert only entries with a trailing percent sign, and another for which you'd define location_idx_gin and you'd insert the rest of the entries? Surely querying the parent table would defer to querying the children tables, each of which has the right index and the right kind of records in it? – jchevali Dec 08 '18 at 21:50
  • @jchevali have not tried that - seems like reasonable solution. In the end we parsed the URL into its various parts (domain, path, query, fragment) and we index them separately – maxTrialfire Dec 10 '18 at 20:27

0 Answers0