27

I'm building an ASP.NET MVC site where I plan to use Lucene.Net. I've envisioned a way to structure the usage of Lucene, but not sure whether my planned architecture is OK and efficient.


My Plan:

  • On Application_Start event in Global.asax: I check for the existence of the index on the file system - if it doesn't exist, I create it and fill it with documents extracted it from the database.
  • When new content is submitted: I create an IndexWriter, fill up a document, write to the index, and finally dispose of the IndexWriter. IndexWriters are not reused, as I can't imagine a good way to do that in an ASP.NET MVC application.
  • When content is edited: I repeat the same process as when new content is submitted, except that I first delete the old content and then add the edits.
  • When a user searches for content: I check HttpRuntime.Cache to see if a user has already searched for this term in the last 5 minutes - if they have, I return those results; otherwise, I create an IndexReader, build and run a query, put the results in HttpRuntime.Cache, return them to the user, and finally dispose of the IndexReader. Once again, IndexReaders aren't reused.

My Questions:

  • Is that a good structure - how can I improve it?
  • Are there any performance/efficiency problems I should be aware of?
  • Also, is not reusing the IndexReaders and IndexWriters a huge code smell?
Maxim Zaslavsky
  • 17,787
  • 30
  • 107
  • 173
  • 5
    It would be awesome if you wrote a short step-by-step tutorial on how you integrated Lucene.NET with your ASP.NET MVC site, preferably as a wiki-style answer on SO. – Petrus Theron Dec 11 '12 at 13:04
  • @FreshCode Good call. My implementation is not perfect, but it works, and I think I'll write it up as soon as my finals end next week. I've been meaning to publish a bunch of ASP.NET MVC helpers anyway, so I'll keep you posted. – Maxim Zaslavsky Dec 12 '12 at 03:26
  • @MaximZaslavsky Did you ever write that tutorial? I would be interested in reading it. – Jean-François Beauchamp Jun 05 '13 at 18:14

2 Answers2

15

The answer to all three of your questions is the same: reuse your readers (and possibly your writers). You can use a singleton pattern to do this (i.e. declare your reader/writer as public static). Lucene's FAQ tells you the same thing: share your readers, because the first query is reaaalllyyyy slow. Lucene handles all the locking for you, so there is really no reason why you shouldn't have a shared reader.

It's probably easiest to just keep your writer around and (using the NRT model) get the readers from that. If it's rare that you are writing to the index, or if you don't have a huge need for speed, then it's probably OK to open your writer each time instead. That is what I do.

Edit: added a code sample:

public static IndexWriter writer = new IndexWriter(myDir);

public JsonResult SearchForStuff(string query)
{
    IndexReader reader = writer.GetReader();
    IndexSearcher search = new IndexSearcher(reader);
    // do the search
}
Xodarap
  • 11,581
  • 11
  • 56
  • 94
  • Thanks for your answer. This means that I should just put the IndexReader as a controller `public static` field? Also, how do I renew the IndexReader (when the index is updated)? :) Or are you saying that it's better to keep the writer around rather than the reader? – Maxim Zaslavsky Aug 17 '10 at 19:44
  • Yes, make it a public static field. Unless you will have multiple processes writing to the same location, I think it is better to persist the writer and use the NRT model to get your readers. If you decide to persist readers though, reader.IsCurrent() will tell you if the reader is current, and reader.Reopen() will reopen it. I added a code sample for the NRT style. – Xodarap Aug 17 '10 at 20:06
  • Isn't it necessary to close the IndexWriter afterwards or frequently commiting is enough? – jorgebg Jun 04 '12 at 18:06
  • 1
    @jorgebg: you should only need to close the writer when your app shuts down (in the general case) – Xodarap Jun 04 '12 at 23:00
  • 1
    @Xodarap - This post was really useful, thanks! In our MVC app we're not writing directly to the index because we didn't want to block, so we're using message queuing to handle the writes. Do you think this is a good strategy? We're making our Reader static and reopening it when it is not current. – Pandincus Aug 14 '12 at 16:15
  • @Pandincus: Yes, I think queuing is a good way to make it non-blocking (as long as you are willing to lose queued data in the event of a crash). – Xodarap Aug 18 '12 at 20:25
13

I would probably skip the caching -- Lucene is very, very efficent. Perhaps so efficent that it is faster to search again than cache.

The OnApplication_Start full index feels a bit off to me -- should probably be run in it's own thread so as not to block other expensive startup activities.

Wyatt Barnett
  • 15,573
  • 3
  • 34
  • 53