how to make lucene be case-insensitive

Question

By default word "Word" and "word" are not the same. How can I make Lucene be case-insensitive?

Johan Sjöberg · Accepted Answer · 2011-04-01T11:54:36.130

12

The easiest approach is lowercasing all searchable content, as well as the queries. See the LowerCaseFilter documentation. You could also use Wildcard queries for case insensitive search since it bypasses the Analyzer.

You can store content in different fields to capture different case configurations if preferred.

edited Apr 01 '11 at 11:54

answered Apr 01 '11 at 11:38

Johan Sjöberg

47,929
21
130
148

5

a line or two of sample source code can make your answer complete – Aqeel Ashiq Nov 23 '17 at 13:02

score 7 · Answer 2 · answered Apr 01 '11 at 12:01

The StandardAnalyzer applies a LowerCaseFilter that would make "Word" and "word" the same. You could simply pass that to your uses of IndexWriter and QueryParser. E.g. a few line snippets:

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);
IndexWriter writer = new IndexWriter(dir, analyzer, true, mlf);
QueryParser parser = new QueryParser(Version.LUCENE_30, field, analyzer);

score 3 · Answer 3 · answered Sep 29 '18 at 00:11

3

In addition to using the StandardAnalyzer, which includes LowerCaseFilter and filters for common English words (such as "the"), you should also ensure you build your document using TextFields, not StringField which are for exact searches.

answered Sep 29 '18 at 00:11

Luchio

53
6

score 0 · Answer 4 · answered Feb 15 '12 at 23:40

Add LowerCaseFilterFactory to your fieldType for that field in Schema.xml. Example,

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" />
                <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>

            <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
                <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>
        </fieldType>

great answer but the question is about Lucene, which doesn't have a schema like Solr does — Tom Saleeba, Jan 30 '17 at 03:58

how to make lucene be case-insensitive

4 Answers4

Linked