34

I am trying to run a fulltext query using Postgresql that can cater for partial matches using wildcards.

It seems easy enough to have a postfix wildcard after the search term, however I cannot figure out how to specify a prefix wildcard.

For example, I can perform a postfix search easily enough using something like..

SELECT "t1".* 
FROM "t1" 
WHERE (to_tsvector('simple', "t1"."city") @@ to_tsquery('simple', 'don:*') )

should return results matching "London"

However I cant seem to do a prefix search like...

SELECT "t1".* 
FROM "t1" 
WHERE (to_tsvector('simple', "t1"."city") @@ to_tsquery('simple', ':*don') )

Ideally I'd like to have a wildcard prefixed to the front and end of the search term, something like...

SELECT "t1".* 
FROM "t1" 
WHERE (to_tsvector('simple', "t1"."city") @@ to_tsquery('simple', ':*don:*') )

I can use a LIKE condition however I was hoping to benefit from the performance of the full text search features in Postgres.

Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
Paul Carey
  • 516
  • 1
  • 4
  • 15

2 Answers2

22

Full text search is good for finding words, not substrings.

For substring searches you'd better use like '%don%' with pg_trgm extension available from PostgreSQL 9.1 and using gin (column_name gin_trgm_ops) or using gist (column_name gist_trgm_ops) indexes. But your index would be very big (even several times bigger than your table) and write performance not very good.

There's a very good example of using pg_trgm for substring search on select * from depesz blog.

Tometzky
  • 22,573
  • 5
  • 59
  • 73
  • Thanks for the answer, we've implemented something similar already in terms of the query so with the addition of the trigrams hopefully this should give us the performance gain we require. Thanks again. – Paul Carey Oct 26 '12 at 08:33
  • how to do using gist (column_name gist_trgm_ops) on 2 columns instead of one? – Michał Piotr Stankiewicz Feb 13 '16 at 10:34
10

One wild and crazy way of doing it would be to create a tsvector index of all your documents, reversed. And reverse your queries for postfix search too.

This is essentially what Solr does with its ReversedWildcardFilterFactory

select
reverse('brown fox')::tsvector @@ (reverse('rown') || ':*')::tsquery --true
Neil McGuigan
  • 46,580
  • 12
  • 123
  • 152
  • 3
    Unfortunately if you will query `row` instead of `rown` it will not return results. The reason is that it will check from end to start, but again only from first (last in this situation) letter, and never from the middle. – Bernard Potocki Jul 02 '15 at 13:03
  • @BernardPotocki not in the spec ;) Full text search is hard enough without substrings. If you want to search `row` and match `brown` then this is a good use-case for regexp – Neil McGuigan Oct 13 '16 at 03:32