Background:
SQL database representing different types of events (concerts, football matches, charity collections etc.), where each contain event-related data (concert - artist name, match - host/visitor team). All of these events inherit from one, general table event
, which contains data related to all of them (name, description, location, start/end date).
Inheritance is implemented using table-per-subclass model known from Hibernate or Doctrine. The database also stores tables artists
(id
, name
, birth_date
) and football_teams
(id
, name
, country
, coach_name
) used in event_concerts
and event_football_matches
tables (through FKs).
Problem:
Create a search engine that given some criteria ({name: "manchester", startDate: "01.01.2012 - 01.02.2012"}
or {location: "london", description: "artists +metallica -bieber"}
) will return all events that meet the criteria, as well as results from artists
/football_teams
tables.
Some properties of those events contain large pieces of text, that should be searched through in fulltext-search manner.
Example:
Given following search criteria:
{ location: "london", startDate: "05.11.2012 - 07.11.2012" }
Search engine should return:
- (football event) Arsenal vs Manchester United match, Emirates Stadium, London, 06.11.2012
- (concert event) Metallica concert, Some-Fancy-Location, 05.11.2012
- (football team/not an event) Arsenal, founded: 1886, league: Premier League
- (football team/not an event) Chelsea, founded: 1905, league: Premier League
- (festival event) Halloween in London, 07.11.2012
- (dance event) Sleeping Beauty at Sadler's Wells, £45, 07.11.2012
- (musician, not an event) Neil Christian, 1943 - 2012, Rock'n'Roll vocalist
As you can see, startDate (event-related property) is considered only in case of events.
Search engine has to scan lots of tables, that's why I believe I should use dedicated software (Sphinx, Lucene, ...?) and create separate index just for the searching.
Could anyone suggest some solution for building such an index? What software could I use as a base for that search engine?
EDIT:
Just to clarify: none of the properties is required. Some of them contain dates which will be searched using exact-match, some of them contain short text (like a location) that also will be searched using exact-match. But some of them contain long pieces of text, and that needs to be searched in full-text manner.