3

I want that if someone search for phan then elephant Should match.

Now i have value:*phan* then it works so i tried this

<analyzer type="query">
    <filter class="solr.PatternReplaceFilterFactory" pattern="(.+)" replacement="*$1*" replace="all" />

But then its making the query as "*phan*" as single field not wilcard

how can i do that

user3113427
  • 455
  • 1
  • 6
  • 17

1 Answers1

3

To make Solr find documents for word parts, you need to have a look at the NGramTokenizer or the Edge NGramTokenizer. As you are required to match parts of the word within the middle of it, you should have a look at the NGramTokenizer. If the start and end of the word would do, the EdgeNGram would be favourable, as it is smaller in index terms.

A good sample is found here on SO within the question Apache solr search part of the word.

Why Indexing over query time?

Lucene and as such Solr are not meant to do searches with leading wildcards. So even search for *foo is likely to cause bad performance. Not to mention *foo*. You can read this up in the FAQs 'What wildcard search support is available from Lucene?'

Leading wildcards (e.g. *ook) are not supported by the QueryParser by default. As of Lucene 2.1, they can be enabled by calling QueryParser.setAllowLeadingWildcard( true ). Note that this can be an expensive operation: it requires scanning the list of tokens in the index in its entirety to look for those that match the pattern.

In the SO question Understanding Lucene leading wildcard performance is a more detailed write up on this topic.

Community
  • 1
  • 1
cheffe
  • 9,345
  • 2
  • 46
  • 57
  • I want to apply it on query not index analyser. can i do that – user3113427 Dec 23 '13 at 07:10
  • I dont understand what is that simple thing not included in solr. With N gram and edge gram , suppose i have 20 character long word , so it mean , system has to index all 15 more words to do partial match. is that efficeint – user3113427 Dec 23 '13 at 07:27
  • It is more efficient to _index_ the solution for a search if you can. You invest the computational time required one time only, at index time. If you want to do that _every_ time on query time, you will face higher CPU usage and poorer search performance. – cheffe Dec 23 '13 at 08:06