15

I know about full-text search, but that only matches your query against individual words. I want to select strings that contain a word that starts with words in my query. For example, if I search:

appl

the following should match:

a really nice application
apples are cool
appliances

since all those strings contains words that start with appl. In addition, it would be nice if I could select the number of words that match, and sort based on that.

How can I implement this in PostgreSQL?

2 Answers2

17

Prefix matching with Full Text Search

FTS supports prefix matching. Your query works like this:

SELECT * FROM tbl
WHERE  to_tsvector('simple', string) @@ to_tsquery('simple', 'appl:*');

Note the appended :* in the tsquery. This can use an index. See:

Alternative with regular expressions

SELECT * FROM tbl
WHERE  string ~ '\mappl';

Quoting the manual here:

\m .. matches only at the beginning of a word

To order by the count of matches, you could use regexp_matches()

SELECT tbl_id, count(*) AS matches
FROM  (
   SELECT tbl_id, regexp_matches(string, '\mappl', 'g')
   FROM   tbl
   WHERE  string ~ '\mappl'
   ) sub
GROUP  BY tbl_id
ORDER  BY matches DESC;

Or regexp_split_to_table():

SELECT tbl_id, string, count(*) - 1 AS matches
FROM  (
   SELECT tbl_id, string, regexp_split_to_table(string, '\mappl')
   FROM   tbl
   WHERE  string ~ '\mappl'
   ) sub
GROUP  BY 1, 2
ORDER  BY 3 DESC, 2, 1;

db<>fiddle here
Old sqlfiddle

Postgres 9.3 or later has index support for simple regular expressions with a trigram GIN or GiST index. The release notes for Postgres 9.3:

Add support for indexing of regular-expression searches in pg_trgm (Alexander Korotkov)

See:

Depesz wrote a blog about index support for regular expressions.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • SELECT * FROM tbl WHERE to_tsvector('simple', string) @@ to_tsquery('simple', 'appl:*'); this is working fine with string but not for numebrs – dev_sk Dec 30 '21 at 12:15
  • SELECT * FROM tbl WHERE to_tsvector('simple', string) @@ to_tsquery('simple', '111'); this is not working – dev_sk Dec 30 '21 at 12:16
  • @dev_sk: works for me. *db<>fiddle [here](https://dbfiddle.uk/?rdbms=postgres_14&fiddle=c79d63b9567fb39bbc572500ff97b192)* – Erwin Brandstetter Dec 30 '21 at 13:35
12
SELECT * FROM some_table WHERE some_field LIKE 'appl%' OR some_field LIKE '% appl%';

As for counting the number of words that match, I believe that would be too expensive to do dynamically in postgres (though maybe someone else knows better). One way you could do it is by writing a function that counts occurrences in a string, and then add ORDER BY myFunction('appl', some_field). Again though, this method is VERY expensive (i.e. slow) and not recommended.

For things like that, you should probably use a separate/complimentary full-text search engine like Sphinx Search (google it), which is specialized for that sort of thing.

An alternative to that, is to have another table that contains keywords and the number of occurrences of those keywords in each string. This means you need to store each phrase you have (e.g. really really nice application) and also store the keywords in another table (i.e. really, 2, nice, 1, application, 1) and link that keyword table to your full-phrase table. This means that you would have to break up strings into keywords as they are entered into your database and store them in two places. This is a typical space vs speed trade-off.

Hamza Kubba
  • 2,259
  • 14
  • 12
  • I'm reading up on `LIKE` right now. What if I have more than one word in the search query? What if I want to search `appl goog` and have it match `Apple and Google are good companies`? –  Oct 14 '13 at 21:31
  • 1
    `SELECT * FROM some_table WHERE (some_field LIKE 'appl%' OR some_field LIKE '% appl%') AND (some_field LIKE 'goog%' OR some_field LIKE '% goog%');` – Hamza Kubba Oct 14 '13 at 21:35
  • I do want to emphasize that this is a hackish way to do it. For example, the following would not match the `appl` search: `"apples are my favorite!" exclaimed the boy.` because there is a " before the appl. Also, you might want to use ILIKE (for case insensitive searching). – Hamza Kubba Oct 14 '13 at 21:39