9

If I have strings/phrases like this stored in the database:

  • What are Q-type Operations?
  • Programmer's Guide
  • A.B.C's of Coding

Is there a way to pass a query parameter in like "Programmers" or "abc" or "q-type" and have it find "Programmer's", "A.B.C" and "Q-type"?

Joril
  • 19,961
  • 13
  • 71
  • 88
Lance
  • 75,200
  • 93
  • 289
  • 503

6 Answers6

6

tsvector

Use the tsvector type, which is part of the PostgreSQL text-search feature.

postgres> select 'What are Q-type Operations?'::tsvector;
              tsvector               
-------------------------------------
 'Operations?' 'Q-type' 'What' 'are'
(1 row)

You can use familiar operators on tsvectors as well:

postgres> select 'What are Q-type Operations?'::tsvector
postgres>        || 'A.B.C''s of Coding'::tsvector;
                           ?column?                           
--------------------------------------------------------------
 'A.B.C''s' 'Coding' 'Operations?' 'Q-type' 'What' 'are' 'of'

From tsvector documentation:

A tsvector value is a sorted list of distinct lexemes, which are words that have been normalized to merge different variants of the same word (see Chapter 12 for details). Sorting and duplicate-elimination are done automatically during input

If you also want to do language-specific normalization, like removing common words ('the', 'a', etc) and multiplies, use the to_tsvector function. It also assigns weights to different words for text search:

postgres> select to_tsvector('english',
postgres> 'What are Q-type Operations? A.B.C''s of Coding');
                      to_tsvector                       
--------------------------------------------------------
 'a.b.c':7 'code':10 'oper':6 'q':4 'q-type':3 'type':5
(1 row)

Full-blown text search

Obviously doing this for every row in a query will be expensive -- so you should store the tsvector in a separate column and use ts_query() to search for it. This also allows you to create a GiST index on the tsvector.

postgres> insert into text (phrase, tsvec)
postgres>   values('What are Q-type Operations?',
postgres>   to_tsvector('english', 'What are Q-type Operations?'));
INSERT 0 1

Searching is done using tsquery and the @@ operator:

postgres> select phrase from text where tsvec @@ to_tsquery('q-type');
           phrase            
-----------------------------
 What are Q-type Operations?
(1 row)
intgr
  • 19,834
  • 5
  • 59
  • 69
2

You could try with an ILIKE with a TRANSLATE function, see here.

For example: translate(field, '.-\'', '')

Joril
  • 19,961
  • 13
  • 71
  • 88
  • aw this would've been a perfect short-term solution to searching simple strings, except I can't get that apostrophe to escape in Rails! `Article.find(:conditions => ["translate(articles.title, '.-\'', '') ILIKE ?", query])` throws `...unterminated quoted string...` (tried several methods of escaping). Might go with tsvector, more likely solr. Thanks! – Lance Mar 18 '11 at 16:43
  • Whoops, maybe to escape an apostrophe you have to double it: `''` – Joril Mar 18 '11 at 16:45
  • It might be helpful to use the generalized single quote syntax that ruby provides: `Article.find(:conditions => [ %q{translate(articles.title, '.-\'', '') ILIKE ?}, query])`. Otherwise ruby will catch the \ that you're trying to pass on to sql, and you'd have to use 2 or 3 '\'s. Ruby doesn't escape within single quoted strings, the way it does with double quoted strings. ie `puts('1\n2'); puts("3\n4") #=> 1\n2*newline*3*newline*4*newline*` – Tim Snowhite Mar 22 '11 at 16:08
1

Here's another link that can be relevant. Strip the value of the field from all punctuation before comparing it to the search string.

SQL Server: How do you remove punctuation from a field?

Community
  • 1
  • 1
Ilya Kogan
  • 21,995
  • 15
  • 85
  • 141
0

This sounds like you want something along these lines:

http://www.postgresql.org/docs/9.0/static/fuzzystrmatch.html

I'm not 100% sure if that will cover what you want though.

EDIT I had to run this up locally to check (Using PostgreSQL 9.0 on Windows)

Here's what I found:

template1=> select soundex('Programmers'), soundex('Programmer''s');
 soundex | soundex
---------+---------
 P626    | P626
(1 row)


template1=> select soundex('abc'), soundex('A.B.C.');
 soundex | soundex
---------+---------
 A120    | A120
(1 row)


template1=> select soundex('Q-type'), soundex('q-type');
 soundex | soundex
---------+---------
 Q310    | Q310
(1 row)

So if you were to do soundex(colname) = soundex(<user param>) should get you what you need in the where clause.

You will need to install the fuzzystrmatch module:

psql -U <dbowner> -d <database> -f SHAREDIR/contrib/fuzzystrmatch.sql

Refer to the documentation on how to locate SHAREDIR

EDIT I just noticed what I overlooked, I think this combined with the ts_vector functionality may get you where you are aiming for.

shybovycha
  • 11,556
  • 6
  • 52
  • 82
Dave G
  • 9,639
  • 36
  • 41
0

Postgresql supports full-text searching by converting text input to tsvector types:

steve@steve@[local] =# select input, to_tsvector('english', input)\
   from (values('What are Q-type Operations?'),('Programmer''s Guide'),('A.B.C''s of Coding')) x(input);
            input            |            to_tsvector             
-----------------------------+------------------------------------
 What are Q-type Operations? | 'oper':6 'q':4 'q-type':3 'type':5
 Programmer's Guide          | 'guid':3 'programm':1
 A.B.C's of Coding           | 'a.b.c':1 'code':4
(3 rows)

As you can see, the stemming used by default will make "programming" "programmer" and "programmer's" all match identically.

You would typically use this by having an indexed tsvector column or expression, and then using the @@ operator to match that with a tsquery, e.g.:

steve@steve@[local] =# select input, to_tsvector('english', input) \
   from (values('What are Q-type Operations?'),('Programmer''s Guide'),('A.B.C''s of Coding')) x(input)\
   where to_tsvector('english', input) @@ plainto_tsquery('english', 'programmers');
       input        |      to_tsvector      
--------------------+-----------------------
 Programmer's Guide | 'guid':3 'programm':1
(1 row)

Here plainto_tsquery analyses a user input string, and produces a query where every non-stop word in the query has to be matched by a tsvector.

araqnid
  • 127,052
  • 24
  • 157
  • 134
-1

Postgres supports pattern matching so you can build a regular expression in your where clause http://www.postgresql.org/docs/8.3/static/functions-matching.html

sreimer
  • 4,913
  • 2
  • 33
  • 43