2

we have a site where the user can enter the name of a city. Lucene.net 2.1.0.3 is the search engine to look for cities that have already been created. As configured Lucene does not recognise that Saint Jerome is the same as St. Jerome or that Lake Phillip is the same as Lac Phillip.

Any tips on widening the search strategy for Lucene.Net?

kevinskio
  • 4,431
  • 1
  • 22
  • 36
  • Here is a similar question: http://stackoverflow.com/questions/3223637/how-to-perform-phonetic-and-aproximative-search-in-lucene-net – agent-j Jul 19 '11 at 15:13

1 Answers1

2

I've read a bit about this synonyming and "sounds like" (read "I currently have no experience with this"). To me it seems like two different problems: abbreviation "synonyms" and "sounds like".

Sounds Like

Soundex is an older algorithm which was designed for mispellings of "american" names. There is an improved algorithm called 'Double Metaphone' addressed some of the complaints of Soundex. This library looks promising: http://sourceforge.net/projects/phonetixnet/

Abbreviation Synonyms

While it seems there could be a generic synonyming system, I would expect "Garden City" might get synonyms of "Plot Town" or "Patch burg". I am guessing you'll achieve better results with your own domain-specific synonyms.

It seems that words like 'Saint' ('St.') and 'Mount' ('Mt') would be best handled as synonyms. Here is an article that proposes a fairly simple solution to custom synonyming: http://www.codeproject.com/KB/cs/lucene_custom_analyzer.aspx .

agent-j
  • 27,335
  • 5
  • 52
  • 79
  • Thanks for your help. we are using the query parser which the author of the code project article does not recommend using but I'm sure we can adapt something – kevinskio Jul 21 '11 at 12:27