5

I want to apply case-insensitive search for field myfield in solr.

I googled a bit for that , and i found that , i need to apply LowerCaseFilterFactory to Field Type and field should be of solr.TextFeild.

I applied that in my schema.xml and re-index the data, then also my search seems to be case-sensitive.

Below is search that i perform.

http://localhost:8080/solr/select?q=myfield:"cloud university"&hl=on&hl.snippets=99&hl.fl=myfield

Below is definition for field type

 <fieldType name="text_en_splitting" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <!-- Case insensitive stop word removal.
          add enablePositionIncrements=true in both the index and query
          analyzers to leave a 'gap' for more accurate phrase queries.
        -->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_en.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_en.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>

and below is my field definition

 <field name="myfield" type="text_en_splitting" indexed="true" stored="true" />

Not sure , what is wrong with this. Please help me to resolve this.

Thanks

EDIT

Debug Query

<lst name="debug">
    <str name="rawquerystring">
        "cloud university" AND guid:268406b6-db65-49da-848a-c59248f170db
    </str>
    <str name="querystring">
        "cloud university" AND guid:268406b6-db65-49da-848a-c59248f170db
    </str>
    <str name="parsedquery">
        +PhraseQuery(CC:"cloud univers") +guid:268406b6-db65-49da-848a-c59248f170db
    </str>
    <str name="parsedquery_toString">
        +CC:"cloud univers" +guid:268406b6-db65-49da-848a-c59248f170db
    </str>
    <lst name="explain">
        <str name="KSYS_20120805_1100">
            12.572915 = (MATCH) sum of: 0.03595598 = weight(CC:"cloud univers" in 1560524), product of: 0.51819557 = queryWeight(CC:"cloud univers"), product of: 8.881522 = idf(CC: cloud=4798 univers=625207) 0.05834536 = queryNorm 0.06938689 = fieldWeight(CC:"cloud univers" in 1560524), product of: 1.0 = tf(phraseFreq=1.0) 8.881522 = idf(CC: cloud=4798 univers=625207) 0.0078125 = fieldNorm(field=CC, doc=1560524) 12.536959 = (MATCH) weight(guid:268406b6-db65-49da-848a-c59248f170db in 1560524), product of: 0.85526216 = queryWeight(guid:268406b6-db65-49da-848a-c59248f170db), product of: 14.658615 = idf(docFreq=1, maxDocs=1709587) 0.05834536 = queryNorm 14.658615 = (MATCH) fieldWeight(guid:268406b6-db65-49da-848a-c59248f170db in 1560524), product of: 1.0 = tf(termFreq(guid:268406b6-db65-49da-848a-c59248f170db)=1) 14.658615 = idf(docFreq=1, maxDocs=1709587) 1.0 = fieldNorm(field=guid, doc=1560524)
        </str>
    </lst>
    <str name="QParser">LuceneQParser</str>
    <lst name="timing">
        <double name="time">60.0</double>
        <lst name="prepare">
            <double name="time">1.0</double>
            <lst name="org.apache.solr.handler.component.QueryComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.FacetComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.HighlightComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.StatsComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.DebugComponent">
                <double name="time">0.0</double>
            </lst>
        </lst>
        <lst name="process">
            <double name="time">59.0</double>
            <lst name="org.apache.solr.handler.component.QueryComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.FacetComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.HighlightComponent">
                <double name="time">57.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.StatsComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.DebugComponent">
                <double name="time">2.0</double>
            </lst>
        </lst>
    </lst>
</lst>
meghana
  • 907
  • 1
  • 20
  • 45
  • The configuration is correct. Did you reload the cores after the changes to the schema xml. – Jayendra Aug 22 '12 at 11:57
  • yes @Jayendra, i reload then after changes – meghana Aug 22 '12 at 12:15
  • can you add debugQuery=on in the url and check the debug information as to what the query looks like. – Jayendra Aug 22 '12 at 12:33
  • @Jayendra thanks for your suggestion , i edited my post and added debug query results . please suggest me on that. – meghana Aug 22 '12 at 13:03
  • Sorry .. I dont see the edited post – Jayendra Aug 22 '12 at 13:08
  • sorry ... delayed in editing. – meghana Aug 22 '12 at 13:11
  • i didn't get that, i searched with `cloud university` and solr phrase query results in `cloud univers`. Is that can be reason ?? – meghana Aug 22 '12 at 13:13
  • the query is correct and getting applied. university is transformed to univers cause of stemming filter. However what is guid in query for ?? Probably its not matching thats why no results. – Jayendra Aug 22 '12 at 13:13
  • guid is unique id for each document... i applied it , just ti minimize search result. – meghana Aug 22 '12 at 13:14
  • probably its not matching then. Try without it and you would get search results. – Jayendra Aug 22 '12 at 13:15
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/15665/discussion-between-meghana-and-jayendra) – meghana Aug 22 '12 at 13:35
  • no... @Jayendra , it doesn't work by removing guid also... :( , and yes don't know if it can be useful or not. but i'll like to tell you that my search term is as 'ClOUD UNIVERSITY' in `myfield` – meghana Aug 22 '12 at 13:38
  • what do you mean by: "then also my search seems to be case-sensitive."??? What is the result you get, and what is the expected result ? – Dorin Aug 24 '12 at 13:12
  • @Dorin, in my field one term is like as `ClOUD UNIVERSITY` (all caps just one `l` in small case) , if i do search with `cloud university` then it's not returning me this record. – meghana Aug 27 '12 at 07:42

2 Answers2

6

You should put solr.LowerCaseFilterFactory before the word delimiter because caps in the middle of lower caps or vice versa triggers the word delimiter

Bob Yoplait
  • 2,421
  • 1
  • 23
  • 35
1

I recommend you should use Analysis tool and see how the expression is indexed and how the expression is searched. http://localhost:8983/solr/admin/analysis.jsp?highlight=on

I think there might be a problem with the WordDelimiterFilterFactory ( it is different in query and in index ), but this is just a guess.

Select in the tool field type text_en_splitting and enter at field value index ClOUD UNIVERSITY and at field value query cloud university. Also select Verbose output and see what you get.

Dorin
  • 2,482
  • 2
  • 22
  • 38