9

I want to realize a fulltext search in postgresql combined with a kind of fuzzy search. For my testarea I followed up this article: https://blog.lateral.io/2015/05/full-text-search-in-milliseconds-with-postgresql/ and everything is working fine. But sometimes I have search cases with and without spaces in the searach string like this:

In my 'title'-column there is an entry like 'test123'. My searchstring looks like 'test 123' with a space in it. How can I get a hit in this testcase?

My search-sql-query looks like:

SELECT * 
FROM test, plainto_tsquery('test:*&123:*') as q 
WHERE (tsv @@ q)

result: 0 rows

So I tried to figure out if I can use pg_trgm combined with ts_vector but I can not find a solution. Do you have an idea?

Chris
  • 121
  • 1
  • 9

1 Answers1

16

As the documentation on parsing states:

...plainto_tsquery will not recognize tsquery operators, weight labels, or prefix-match labels in its input...

plainto_tsquery and phraseto_tsquery are convenience functions which make it easier to search by a full string, but they don't support all of the features. Use to_tsquery instead which accepts the full search syntax:

SELECT * 
FROM test, to_tsquery('testing:* & 123:*') as q 
WHERE (tsv @@ q)

This function also requires you to normalize the search query in the same way you normalize the text you're searching by using to_tsvector, but that's pretty easy with some string functions:

SELECT string_agg(lexeme || ':*', ' & ' order by positions) 
FROM unnest(to_tsvector('testing 123'))

This basically gets the individual tokens from to_tsvector, appends :* to each, then joins them with & to create a single string. The example above takes testing 123 and produces testing:* & 123:* which you can then use directly with to_tsquery to get fuzzy matching with the normalization intact.

You can combine it all together into a CTE to make it simple:

WITH search AS (
    SELECT to_tsquery(string_agg(lexeme || ':*', ' & ' order by positions)) AS query
    FROM unnest(to_tsvector('enter your search query here'))
)
SELECT test.*
FROM test, search
WHERE (test.tsv @@ search.query)

This assumes that the table has a tsv column of datatype tsquery which is pregenerated instead of creating it on every query (which is much slower). PG12+ supports generated columns which can keep this updated automatically.

More in-depth answer on my blog: https://manigandham.com/post/fuzzy-fulltext-search-postgresql

Mani Gandham
  • 7,688
  • 1
  • 51
  • 60
  • I seem to be missing something. This is not fuzzy matching but just prefix matching, is it? I'm trying to replicate a similar example to the one from the OP, and not even that works for me because it's only matching prefixes and not postfixes. But even if the example worked wiht pre and postfixes, it's still not fuzzy matching, is it? Fuzzy matching means that `samething` and `something` matches. I have PostgreSQL version 11.20. – Manuel Schmidt Jun 28 '23 at 10:08