1

Okay so I am building this website which has functionality similar to facebook and twitter and it has highly dynamic content.

All the questions etc, and search engines I have seen are that which create index and are therefore better suited with a site with more or less static content.

I need some recommendations and advice on how to use these index based search engines with a highly dynamic website considering new users will be joining every hour, new content generated, and content being edited. It is quite clear that rebuilding an index every time just to keep the search from going stale is rather absurd.

One solution that came close to solving this issue was using MySQL MYISAM FULLTEXT search columns but I really need an alternate to that because the lack of foreign keys leads to data redundancy, plus I need something that can scale as the website grows and be flexible to customized algorithms for ranking etc.

Thanks..

Abdullah Khan
  • 2,384
  • 2
  • 22
  • 32
  • See http://stackoverflow.com/questions/1381186/fulltext-search-with-innodb – The Scrum Meister Feb 11 '11 at 05:10
  • Hmm the solutions posed here say that the only other way is to constantly make index updates... – Abdullah Khan Feb 11 '11 at 05:36
  • Performance comes with a price my friend. Is there any reason you do not want to use a search engine such as Lucene or Sphinx? – The Scrum Meister Feb 11 '11 at 05:54
  • It's not that I don't want to use it, Its just the thought that the only way to keep the index from going stale is a FULL rebuild, While when I analyzed Facebook and twitter, the item can be searched as soon as its created, I was hoping to get some advice to design something around those lines. – Abdullah Khan Feb 11 '11 at 06:03

1 Answers1

1

Sphinx allows partial indexes. So, it'll have a main index and a secondary partial index which can be updated any time. It's also incredibly fast at indexing, so you may find that rebuilding the entire index every, say, 5 minutes is fast enough for you. If it's not, use the partial index option, and kick it off every time a piece of content is added. Sphinx is used by craigslist, so that something to its scale. We've had great luck with it on StartUpHire - it rebuilds our entire index in a couple seconds, then signals the search daemon to use the newly built index.

I'd highly recommend giving it a try before you say it's not a good fit.

sheetzam
  • 68
  • 1
  • 7