Solr - case-insensitive search do not work

Question

I want to apply case-insensitive search for field myfield in solr.

I googled a bit for that , and i found that , i need to apply LowerCaseFilterFactory to Field Type and field should be of solr.TextFeild.

I applied that in my schema.xml and re-index the data, then also my search seems to be case-sensitive.

Below is search that i perform.

http://localhost:8080/solr/select?q=myfield:"cloud university"&hl=on&hl.snippets=99&hl.fl=myfield

Below is definition for field type

 <fieldType name="text_en_splitting" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <!-- Case insensitive stop word removal.
          add enablePositionIncrements=true in both the index and query
          analyzers to leave a 'gap' for more accurate phrase queries.
        -->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_en.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_en.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>

and below is my field definition

 <field name="myfield" type="text_en_splitting" indexed="true" stored="true" />

Not sure , what is wrong with this. Please help me to resolve this.

Thanks

EDIT

Debug Query

<lst name="debug">
    <str name="rawquerystring">
        "cloud university" AND guid:268406b6-db65-49da-848a-c59248f170db
    </str>
    <str name="querystring">
        "cloud university" AND guid:268406b6-db65-49da-848a-c59248f170db
    </str>
    <str name="parsedquery">
        +PhraseQuery(CC:"cloud univers") +guid:268406b6-db65-49da-848a-c59248f170db
    </str>
    <str name="parsedquery_toString">
        +CC:"cloud univers" +guid:268406b6-db65-49da-848a-c59248f170db
    </str>
    <lst name="explain">
        <str name="KSYS_20120805_1100">
            12.572915 = (MATCH) sum of: 0.03595598 = weight(CC:"cloud univers" in 1560524), product of: 0.51819557 = queryWeight(CC:"cloud univers"), product of: 8.881522 = idf(CC: cloud=4798 univers=625207) 0.05834536 = queryNorm 0.06938689 = fieldWeight(CC:"cloud univers" in 1560524), product of: 1.0 = tf(phraseFreq=1.0) 8.881522 = idf(CC: cloud=4798 univers=625207) 0.0078125 = fieldNorm(field=CC, doc=1560524) 12.536959 = (MATCH) weight(guid:268406b6-db65-49da-848a-c59248f170db in 1560524), product of: 0.85526216 = queryWeight(guid:268406b6-db65-49da-848a-c59248f170db), product of: 14.658615 = idf(docFreq=1, maxDocs=1709587) 0.05834536 = queryNorm 14.658615 = (MATCH) fieldWeight(guid:268406b6-db65-49da-848a-c59248f170db in 1560524), product of: 1.0 = tf(termFreq(guid:268406b6-db65-49da-848a-c59248f170db)=1) 14.658615 = idf(docFreq=1, maxDocs=1709587) 1.0 = fieldNorm(field=guid, doc=1560524)
        </str>
    </lst>
    <str name="QParser">LuceneQParser</str>
    <lst name="timing">
        <double name="time">60.0</double>
        <lst name="prepare">
            <double name="time">1.0</double>
            <lst name="org.apache.solr.handler.component.QueryComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.FacetComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.HighlightComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.StatsComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.DebugComponent">
                <double name="time">0.0</double>
            </lst>
        </lst>
        <lst name="process">
            <double name="time">59.0</double>
            <lst name="org.apache.solr.handler.component.QueryComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.FacetComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.HighlightComponent">
                <double name="time">57.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.StatsComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.DebugComponent">
                <double name="time">2.0</double>
            </lst>
        </lst>
    </lst>
</lst>

The configuration is correct. Did you reload the cores after the changes to the schema xml. — Jayendra, Aug 22 '12 at 11:57
can you add debugQuery=on in the url and check the debug information as to what the query looks like. — Jayendra, Aug 22 '12 at 12:33
@Jayendra thanks for your suggestion , i edited my post and added debug query results . please suggest me on that. — meghana, Aug 22 '12 at 13:03
i didn't get that, i searched with `cloud university` and solr phrase query results in `cloud univers`. Is that can be reason ?? — meghana, Aug 22 '12 at 13:13
the query is correct and getting applied. university is transformed to univers cause of stemming filter. However what is guid in query for ?? Probably its not matching thats why no results. — Jayendra, Aug 22 '12 at 13:13
guid is unique id for each document... i applied it , just ti minimize search result. — meghana, Aug 22 '12 at 13:14
probably its not matching then. Try without it and you would get search results. — Jayendra, Aug 22 '12 at 13:15
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/15665/discussion-between-meghana-and-jayendra) — meghana, Aug 22 '12 at 13:35
no... @Jayendra , it doesn't work by removing guid also... :( , and yes don't know if it can be useful or not. but i'll like to tell you that my search term is as 'ClOUD UNIVERSITY' in `myfield` — meghana, Aug 22 '12 at 13:38
what do you mean by: "then also my search seems to be case-sensitive."??? What is the result you get, and what is the expected result ? — Dorin, Aug 24 '12 at 13:12
@Dorin, in my field one term is like as `ClOUD UNIVERSITY` (all caps just one `l` in small case) , if i do search with `cloud university` then it's not returning me this record. — meghana, Aug 27 '12 at 07:42

score 6 · Accepted Answer · answered Aug 27 '12 at 09:57

6

You should put solr.LowerCaseFilterFactory before the word delimiter because caps in the middle of lower caps or vice versa triggers the word delimiter

answered Aug 27 '12 at 09:57

Bob Yoplait

2,421
1
23
35

score 1 · Answer 2 · answered Aug 27 '12 at 09:48

I recommend you should use Analysis tool and see how the expression is indexed and how the expression is searched. http://localhost:8983/solr/admin/analysis.jsp?highlight=on

I think there might be a problem with the WordDelimiterFilterFactory ( it is different in query and in index ), but this is just a guess.

Select in the tool field type text_en_splitting and enter at field value index ClOUD UNIVERSITY and at field value query cloud university. Also select Verbose output and see what you get.

Thanks @Dorin, i sourly try this out and let you know. :) – meghana Sep 13 '12 at 13:28 — meghana, Sep 13 '12 at 13:28

Solr - case-insensitive search do not work

2 Answers2

Linked