tsvector
Use the tsvector
type, which is part of the PostgreSQL text-search feature.
postgres> select 'What are Q-type Operations?'::tsvector;
tsvector
-------------------------------------
'Operations?' 'Q-type' 'What' 'are'
(1 row)
You can use familiar operators on tsvectors as well:
postgres> select 'What are Q-type Operations?'::tsvector
postgres> || 'A.B.C''s of Coding'::tsvector;
?column?
--------------------------------------------------------------
'A.B.C''s' 'Coding' 'Operations?' 'Q-type' 'What' 'are' 'of'
From tsvector documentation:
A tsvector value is a sorted list of distinct lexemes, which are words that have been normalized to merge different variants of the same word (see Chapter 12 for details). Sorting and duplicate-elimination are done automatically during input
If you also want to do language-specific normalization, like removing common words ('the', 'a', etc) and multiplies, use the to_tsvector
function. It also assigns weights to different words for text search:
postgres> select to_tsvector('english',
postgres> 'What are Q-type Operations? A.B.C''s of Coding');
to_tsvector
--------------------------------------------------------
'a.b.c':7 'code':10 'oper':6 'q':4 'q-type':3 'type':5
(1 row)
Full-blown text search
Obviously doing this for every row in a query will be expensive -- so you should store the tsvector in a separate column and use ts_query() to search for it. This also allows you to create a GiST index on the tsvector.
postgres> insert into text (phrase, tsvec)
postgres> values('What are Q-type Operations?',
postgres> to_tsvector('english', 'What are Q-type Operations?'));
INSERT 0 1
Searching is done using tsquery and the @@ operator:
postgres> select phrase from text where tsvec @@ to_tsquery('q-type');
phrase
-----------------------------
What are Q-type Operations?
(1 row)