13

I have been tasked with creating a site wide search feature. The search needs to look at articles, events and page content

I've used MATCH()/AGAINST() in MySQL before and know how to get the relevance of a result but as far as I know the relevance is unique to the search (contents, number of rows etc) the relevance of results from the articles table wont match the relevance of results from the events table.

Is there anyway to unify the relevance so that results from all three tables have a comparable relevance?

michael
  • 4,427
  • 6
  • 38
  • 57
  • Logically this seems to be a good place to use a union and sub selects with match against; but I've never used it to search in this fashion; so I doubt this is the BEST way. – xQbert Jan 26 '12 at 13:16
  • would there be any way for you to weight the relevances? just a simple multiply – bowlerae Jan 26 '12 at 13:46
  • I wondered about normalising the highest relevance to 1 but that still throws the results out across multiple tables – michael Jan 26 '12 at 13:51
  • Can u put the strucure and expected results? It will be an assistance for better understanding. – Angelin Nadar Jan 26 '12 at 14:45

2 Answers2

22

Yes, you can unify them very well using a search engine such as Apache Lucene and Solr.

http://lucene.apache.org/solr/

If you need to do it only in MySQL, you can do this with a UNION. You'll probably want to suppress any zero-relevant results.

You'll need to decide how you want to affect the relevance depending on which table matches.

For example, suppose you want articles to be most important, events to be medium important, and pages to be least important. You can use multipliers like this:

set @articles_multiplier=3;
set @events_multiplier=2;
set @pages_multiplier=1;

Here's a working example you can try that demonstrates some of these techniques:

Create sample data:

create database d;
use d;

create table articles (id int primary key, content text) ENGINE = MYISAM;
create table events (id int primary key, content text) ENGINE = MYISAM;
create table pages (id int primary key, content text) ENGINE = MYISAM;

insert into articles values 
(1, "Lorem ipsum dolor sit amet"),
(2, "consectetur adipisicing elit"),
(3, "sed do eiusmod tempor incididunt");

insert into events values 
(1, "Ut enim ad minim veniam"),
(2, "quis nostrud exercitation ullamco"),
(3, "laboris nisi ut aliquip");

insert into pages values 
(1, "Duis aute irure dolor in reprehenderit"),
(2, "in voluptate velit esse cillum"),
(3, "dolore eu fugiat nulla pariatur.");

Make it searchable:

ALTER TABLE articles ADD FULLTEXT(content);
ALTER TABLE events ADD FULLTEXT(content);
ALTER TABLE pages ADD FULLTEXT(content);

Use a UNION to search all these tables:

set @target='dolor';

SELECT * from (
  SELECT 
    'articles' as 'table_name', id, 
    @articles_multiplier * (MATCH(content) AGAINST (@target)) as relevance
    from articles
  UNION
  SELECT 
    'events' as 'table_name', 
    id,
    @events_multiplier * (MATCH(content) AGAINST (@target)) as relevance
    from events
  UNION
  SELECT 
    'pages' as 'table_name', 
    id, 
    @pages_multiplier * (MATCH(content) AGAINST (@target)) as relevance
    from pages
)
as sitewide WHERE relevance > 0;

The result:

+------------+----+------------------+
| table_name | id | relevance        |
+------------+----+------------------+
| articles   |  1 | 1.98799377679825 |
| pages      |  3 | 0.65545331108093 |
+------------+----+------------------+
joelparkerhenderson
  • 34,808
  • 19
  • 98
  • 119
  • This is awesome! I have question very similar, but I need related matches. could you take a look at it also? http://stackoverflow.com/q/9953922/633513 – LordZardeck Mar 31 '12 at 07:23
2

(Sorry, I want to leave this as comment to the above answer, but I dont have enough reputation to comment)

Be aware that UNION in subqueries are very poorly optimized. A frequently case is when you want to paginate your results using "LIMIT @page * 10, 10" in the parent query, then MySQL must get all the results from the subqueries in order to evaluate the parent query.

kien
  • 197
  • 9