2

I'm writing some very simple search functionality for a list of FAQ's. I'm splitting the search string on a variety of characters, including spaces. Then performing a select along the lines of

SELECT *
FROM "faq"
WHERE
    ((LOWER("Question") LIKE '%what%'
   OR LOWER("Question") LIKE '%is%'
   OR LOWER("Question") LIKE '%a%'
   OR LOWER("Question") LIKE '%duck%'))

I've had to edit this slightly as its generated by our data access layer but it should give you an idea of whats going on.

The problem is demonstrated well with the above query in that most questions are likely to have the words a or is in them, however I can not filter these out as acronyms may well be important for the searcher. What has been suggested is we order by the number of matching keywords. However I have been unable to find a way of doing this in SQL (we do not have time to create a simple search engine with an index of keywords etc). Does anyone know if there's a way of counting the number of LIKE matches in an SQL statement and ordering by that so that the questions with the most keywords appear at the top of the results?

Daniel Böhmer
  • 14,463
  • 5
  • 36
  • 46
ClearCarbon
  • 459
  • 4
  • 13

2 Answers2

3

I assume the list of matching keywords is being entered by the user and inserted into the query dynamically by the application, immediately prior to executing the query. If so, I suggest amending the query like so:

SELECT *
FROM "faq"
WHERE
    ((LOWER("Question") LIKE '%what%'
   OR LOWER("Question") LIKE '%is%'
   OR LOWER("Question") LIKE '%a%'
   OR LOWER("Question") LIKE '%duck%'))
order by
    case when LOWER("Question") LIKE '%what%' then 1 else 0 end +
    case when LOWER("Question") LIKE '%is%' then 1 else 0 end +
    case when LOWER("Question") LIKE '%a%' then 1 else 0 end +
    case when LOWER("Question") LIKE '%duck%' then 1 else 0 end
descending;

This would even enable you to "weight" the importance of each selection term, assuming the user (or an algorithm) could assign a weighting to each term.

One caveat: if your query is being constructed dynamically, are you aware of the risk of SQL Insertion attacks?

  • Thanks seems to work perfectly in PgAdmin. We use a data access layer called Entityspaces which sanitises the data for us. However we may have to build this query ourselves so will proceed with caution :) – ClearCarbon Dec 16 '11 at 13:37
  • should just noted I had to change the descending to desc in pgAdmin to get it to work. – ClearCarbon Dec 16 '11 at 15:02
2

You can write a function which counts the occurrences of one string in another like this:

CREATE OR REPLACE FUNCTION CountInString(text,text)
RETURNS integer AS $$
 SELECT(Length($1) - Length(REPLACE($1, $2, ''))) / Length($2) ;
$$ LANGUAGE SQL IMMUTABLE;

And use it in the select: select CountInString("Question",' what ') from "faq".

ivan
  • 390
  • 3
  • 13
  • I'm fairly sure this would of worked just as well however the order by case solution is more appropriate to our situation at the moment. – ClearCarbon Dec 16 '11 at 15:03