How do I SQL query for words with punctuation in Postgresql?

Question

If I have strings/phrases like this stored in the database:

What are Q-type Operations?

Programmer's Guide

A.B.C's of Coding

Is there a way to pass a query parameter in like "Programmers" or "abc" or "q-type" and have it find "Programmer's", "A.B.C" and "Q-type"?

intgr · Accepted Answer · 2011-03-18T16:13:36.733

tsvector

Use the tsvector type, which is part of the PostgreSQL text-search feature.

postgres> select 'What are Q-type Operations?'::tsvector;
              tsvector               
-------------------------------------
 'Operations?' 'Q-type' 'What' 'are'
(1 row)

You can use familiar operators on tsvectors as well:

postgres> select 'What are Q-type Operations?'::tsvector
postgres>        || 'A.B.C''s of Coding'::tsvector;
                           ?column?                           
--------------------------------------------------------------
 'A.B.C''s' 'Coding' 'Operations?' 'Q-type' 'What' 'are' 'of'

From tsvector documentation:

A tsvector value is a sorted list of distinct lexemes, which are words that have been normalized to merge different variants of the same word (see Chapter 12 for details). Sorting and duplicate-elimination are done automatically during input

If you also want to do language-specific normalization, like removing common words ('the', 'a', etc) and multiplies, use the to_tsvector function. It also assigns weights to different words for text search:

postgres> select to_tsvector('english',
postgres> 'What are Q-type Operations? A.B.C''s of Coding');
                      to_tsvector                       
--------------------------------------------------------
 'a.b.c':7 'code':10 'oper':6 'q':4 'q-type':3 'type':5
(1 row)

Full-blown text search

Obviously doing this for every row in a query will be expensive -- so you should store the tsvector in a separate column and use ts_query() to search for it. This also allows you to create a GiST index on the tsvector.

postgres> insert into text (phrase, tsvec)
postgres>   values('What are Q-type Operations?',
postgres>   to_tsvector('english', 'What are Q-type Operations?'));
INSERT 0 1

Searching is done using tsquery and the @@ operator:

postgres> select phrase from text where tsvec @@ to_tsquery('q-type');
           phrase            
-----------------------------
 What are Q-type Operations?
(1 row)

score 2 · Answer 2 · answered Mar 18 '11 at 15:41

2

You could try with an ILIKE with a TRANSLATE function, see here.

For example: translate(field, '.-\'', '')

answered Mar 18 '11 at 15:41

Joril

19,961
13
71
88

aw this would've been a perfect short-term solution to searching simple strings, except I can't get that apostrophe to escape in Rails! `Article.find(:conditions => ["translate(articles.title, '.-\'', '') ILIKE ?", query])` throws `...unterminated quoted string...` (tried several methods of escaping). Might go with tsvector, more likely solr. Thanks! – Lance Mar 18 '11 at 16:43
Whoops, maybe to escape an apostrophe you have to double it: `''` – Joril Mar 18 '11 at 16:45
It might be helpful to use the generalized single quote syntax that ruby provides: `Article.find(:conditions => [ %q{translate(articles.title, '.-\'', '') ILIKE ?}, query])`. Otherwise ruby will catch the \ that you're trying to pass on to sql, and you'd have to use 2 or 3 '\'s. Ruby doesn't escape within single quoted strings, the way it does with double quoted strings. ie `puts('1\n2'); puts("3\n4") #=> 1\n2*newline*3*newline*4*newline*` – Tim Snowhite Mar 22 '11 at 16:08

score 1 · Answer 3 · edited May 23 '17 at 10:27

1

Here's another link that can be relevant. Strip the value of the field from all punctuation before comparing it to the search string.

SQL Server: How do you remove punctuation from a field?

edited May 23 '17 at 10:27

Community

1
1

answered Mar 18 '11 at 15:44

Ilya Kogan

21,995
15
85
141

score 0 · Answer 4 · edited Nov 07 '13 at 18:19

This sounds like you want something along these lines:

http://www.postgresql.org/docs/9.0/static/fuzzystrmatch.html

I'm not 100% sure if that will cover what you want though.

EDIT I had to run this up locally to check (Using PostgreSQL 9.0 on Windows)

Here's what I found:

template1=> select soundex('Programmers'), soundex('Programmer''s');
 soundex | soundex
---------+---------
 P626    | P626
(1 row)


template1=> select soundex('abc'), soundex('A.B.C.');
 soundex | soundex
---------+---------
 A120    | A120
(1 row)


template1=> select soundex('Q-type'), soundex('q-type');
 soundex | soundex
---------+---------
 Q310    | Q310
(1 row)

So if you were to do soundex(colname) = soundex(<user param>) should get you what you need in the where clause.

You will need to install the fuzzystrmatch module:

psql -U <dbowner> -d <database> -f SHAREDIR/contrib/fuzzystrmatch.sql

Refer to the documentation on how to locate SHAREDIR

EDIT I just noticed what I overlooked, I think this combined with the ts_vector functionality may get you where you are aiming for.

score 0 · Answer 5 · answered Mar 18 '11 at 15:58

Postgresql supports full-text searching by converting text input to tsvector types:

steve@steve@[local] =# select input, to_tsvector('english', input)\
   from (values('What are Q-type Operations?'),('Programmer''s Guide'),('A.B.C''s of Coding')) x(input);
            input            |            to_tsvector             
-----------------------------+------------------------------------
 What are Q-type Operations? | 'oper':6 'q':4 'q-type':3 'type':5
 Programmer's Guide          | 'guid':3 'programm':1
 A.B.C's of Coding           | 'a.b.c':1 'code':4
(3 rows)

As you can see, the stemming used by default will make "programming" "programmer" and "programmer's" all match identically.

You would typically use this by having an indexed tsvector column or expression, and then using the @@ operator to match that with a tsquery, e.g.:

steve@steve@[local] =# select input, to_tsvector('english', input) \
   from (values('What are Q-type Operations?'),('Programmer''s Guide'),('A.B.C''s of Coding')) x(input)\
   where to_tsvector('english', input) @@ plainto_tsquery('english', 'programmers');
       input        |      to_tsvector      
--------------------+-----------------------
 Programmer's Guide | 'guid':3 'programm':1
(1 row)

Here plainto_tsquery analyses a user input string, and produces a query where every non-stop word in the query has to be matched by a tsvector.

score -1 · Answer 6 · answered Mar 18 '11 at 15:46

-1

Postgres supports pattern matching so you can build a regular expression in your where clause http://www.postgresql.org/docs/8.3/static/functions-matching.html

answered Mar 18 '11 at 15:46

sreimer

4,913
2
33
43

A link to the docs is not particularly useful. – AntonOfTheWoods Oct 17 '22 at 05:37

How do I SQL query for words with punctuation in Postgresql?

6 Answers6

tsvector

Full-blown text search