9

If I have a record with keywords Chris Muench, I want to be able to match Mue or Chr. How can I do this with a solr query. Currently I do the following:

$results = $solr->search('"'.Apache_Solr_Service::escape($_GET['textsearch']).'"~100', 0, 100, array('fq' => 'type:datacollection'));

It doesn't match Mue or Chr, but it does match Muench

Schema:

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="rocdocs" version="1.4">
  <types>
    <!-- The StrField type is not analyzed, but indexed/stored verbatim. -->
    <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
    <fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>
 </types>


 <fields>
    <field name="type" type="string" indexed="true" stored="true" required="true" />
    <field name="mongo_id" type="string" indexed="true" stored="true" required="true" />
    <field name="nid" type="int" indexed="true" stored="true" required="true" />
    <field name="keywords" type="text_general" indexed="true" stored="false" />
 </fields>

 <!-- Field to use to determine and enforce document uniqueness. 
      Unless this field is marked with required="false", it will be a required field
   -->
 <uniqueKey>mongo_id</uniqueKey>

 <!-- field for the QueryParser to use when an explicit fieldname is absent -->
 <defaultSearchField>keywords</defaultSearchField>
 <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
 <solrQueryParser defaultOperator="OR"/>
</schema>
Chris Muench
  • 17,444
  • 70
  • 209
  • 362
  • Related SO question, with additional tips, here: http://stackoverflow.com/questions/1974394/apache-solr-search-part-of-the-word/1976045#1976045 – Doug_Ivison Dec 01 '13 at 13:40

1 Answers1

12

You need to either use wildcard queries e.g. chr* or mue* which would match.
This would either client to either enter the query in this format or modifying it in the application.
Else, you can generate tokens using solr.EdgeNGramFilterFactory and this would match the records. e.g. chris would generate ch, chr, chri, chris and hence would match all these combination.

Jayendra
  • 52,349
  • 4
  • 80
  • 90
  • I tried doing: $results = $solr->search('"'.Apache_Solr_Service::escape($_GET['textsearch']).'*"~100', 0, 100, array('fq' => 'type:datacollection')); and it still doesn't match. I would prefer doing this in the search query and NOT use NGramFilterFactory, – Chris Muench Sep 04 '12 at 17:58
  • 1
    the issue with wildcard queries is that they dont undergo analysis during query time and hence may not match. Try searching lower case as you have lower case in your index time analysis. – Jayendra Sep 04 '12 at 18:03
  • That didn't seem to help either. Do I need to do something in my schema? – Chris Muench Sep 04 '12 at 18:11
  • 1
    The schema seems fine as the index time analysis is just stop words and lower case so Chris Muench should be indexed as chris muench. And when you search for chr* mue* it should match the words easily. Can you directly query solr and check. – Jayendra Sep 04 '12 at 18:17
  • It works now. I had the query it quotes. Is it possible to do a phase query (quote) and regex? – Chris Muench Sep 04 '12 at 19:40