9

I have my search term:

"Yellow large widgets"

I split the terms into its 3 words:

1 = "Yellow";
2 = "Large";
2 = "Widgets";

I then search with:

SELECT * FROM widgets
    WHERE (description LIKE '%yellow%' OR description LIKE '%large%' OR description LIKE 'widgets') 
    OR (title LIKE '%yellow%' OR title LIKE '%large%' OR title LIKE '%widgets%')

How can I sort the results with these biases?

  • Title takes presidence, if any of the terms appear in title they should be considered more important
  • Occurrence counts, results with higher total occurances should appear first

Ideal Methodoloy

  • Count occurrences in description.
  • Each occurrence here is worth 1 point.
  • Count occurrences in title.
  • Each title occurrence is worth 5 points.
  • Sort by points.

But I wouldn't know where to start doing that in SQL.

double-beep
  • 5,031
  • 17
  • 33
  • 41
Tom Gullen
  • 61,249
  • 84
  • 283
  • 456

4 Answers4

10

Okay, let's have your search terms in a temp table:

CREATE TABLE #SearchTerms (Term varchar(50) not null)
insert into #SearchTerms (Term)
select 'yellow' union all
select 'large' union all
select 'widgets'

And let's do something silly:

select
    widgets.ID,
    (LEN(description) - LEN(REPLACE(description,Term,''))) / LEN(Term) as DescScore
    (LEN(title) - LEN(REPLACE(title,Term,''))) / LEN(Term) as TitleScore
from
    widgets,#SearchTerms

We've now counted each occurrence of each term, in both the description and the title.

So now we can sum and weight those occurrences:

select
    widgets.ID,
    SUM((LEN(description) - LEN(REPLACE(description,Term,''))) / LEN(Term) +
    ((LEN(title) - LEN(REPLACE(title,Term,''))) / LEN(Term) *5)) as CombinedScore
from
    widgets,#SearchTerms
group by
    Widgets.ID

And if we need to do more with this, I'd recommend putting the above in a subselect

select
    w.*,CombinedScore
from
    widgets.w
       inner join
    (select
        widgets.ID,
        SUM((LEN(description) - LEN(REPLACE(description,Term,''))) / LEN(Term) +
        ((LEN(title) - LEN(REPLACE(title,Term,''))) / LEN(Term) *5)) as CombinedScore
    from
        widgets,#SearchTerms
    group by
        Widgets.ID
    ) t
        on
            w.ID = t.ID
where
    CombinedScore > 0
order by
    CombinedScore desc

(Note that I've assumed there's an ID column in all these examples, but that can be expanded into as many columns as are necessary to define the PK in the widgets table)


The real trick here is counting occurrences of a word within a larger body of text, which is done by:

(LEN(text) - LEN(text with each occurrence of term removed)) / LEN(term)
Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
3

Choose one of the following options:

  • use a fulltext search engine (Lucene or similiar) to do weighted full text search;
  • see this other StackOverflow question
  • do multiple SELECTs, each one with a relevance field, merge them with UNION and sort the result;
  • sort your results in your application, after having retrieved the results.
Community
  • 1
  • 1
Alessandro
  • 1,336
  • 8
  • 15
0

Maybe this way

   SELECT description, title ,
            (       IF(title LIKE '%yellow%', 2, 0) +
            IF(title LIKE '%large%', 2, 0) +
            IF(title LIKE '%widgets%', 2, 0)+       IF(description LIKE '%yellow%', 1, 0) +
            IF(description LIKE '%large%', 1, 0) +
            IF(description LIKE '%widgets%', 1, 0)      )   AS w 

    FROM widget ORDER BY w DESC
0

A quick hack (for mysql, you can use similar constructs on other dbs). Note, not tested.

SELECT description, title
  FROM
(SELECT description, title,
        IF(description LIKE '%yellow%' OR 
           description LIKE '%large%' OR 
           description LIKE 'widgets' 2, 0) +
        IF(title LIKE '%yellow%' OR 
           title LIKE '%large%' OR
           title LIKE '%widgets%', 1, 0) AS w
  FROM widget)
 WHERE w > 0
ORDER BY w
vickirk
  • 3,979
  • 2
  • 22
  • 37