12

By default word "Word" and "word" are not the same. How can I make Lucene be case-insensitive?

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
Eugeny89
  • 3,797
  • 7
  • 51
  • 98

4 Answers4

12

The easiest approach is lowercasing all searchable content, as well as the queries. See the LowerCaseFilter documentation. You could also use Wildcard queries for case insensitive search since it bypasses the Analyzer.

You can store content in different fields to capture different case configurations if preferred.

Johan Sjöberg
  • 47,929
  • 21
  • 130
  • 148
7

The StandardAnalyzer applies a LowerCaseFilter that would make "Word" and "word" the same. You could simply pass that to your uses of IndexWriter and QueryParser. E.g. a few line snippets:

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);
IndexWriter writer = new IndexWriter(dir, analyzer, true, mlf);
QueryParser parser = new QueryParser(Version.LUCENE_30, field, analyzer);
WhiteFang34
  • 70,765
  • 18
  • 106
  • 111
3

In addition to using the StandardAnalyzer, which includes LowerCaseFilter and filters for common English words (such as "the"), you should also ensure you build your document using TextFields, not StringField which are for exact searches.

Luchio
  • 53
  • 6
0

Add LowerCaseFilterFactory to your fieldType for that field in Schema.xml. Example,

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" />
                <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>

            <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
                <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>
        </fieldType>